Page 1: XXL-Test: NVIDIA GeForce GTX Titan in 3-Way-SLI
The Titans are the oldest race of gods of the Theogony in Greek mythology. They have often been described as a giant in human form who ruled over the legendary Golden Age. NVIDIA chose "Titan" as the name for its latest desktop graphics card. The GeForce GTX Titan is supposed to be the ne plus ultra of current GPU development, and NVIDIA states that, in terms of performance and pricing, no compromises have been made. In Greek mythology, the Titans lost the battle against the Olympians and were banished to the underworld. We will see in this article whether the GeForce GTX Titan will share the fate of its namesake. Today however, we are only able to present you the technical details and pictures. You will have to be patient for another 48 hours for the final performance assessment.
We repeat: we need to separate the test into two parts, because today we can only present you with the technical data and pictures. Using this base of information, we can of course discuss at least the architectural details. But all measured values and bemchmarks and the presentation of the new features will have to wait until NVIDIA's Non-Disclosure-Agreement expires on Thursday.
Around mid-May last year at the GTC 2012, NVIDIA mentioned for the first time that the "Kepler" architecture will be split into two sectors: Desktop (GK10x) and GPU computing (GK11x). Rumors of a second, even larger chip abounded already at the time, but the first specific data was released by NVIDIA CEO Jen-Hsun Huang, stating 7.1 billion transistors, 13 or 14 SMX-cluster, a 384-bit memory interface and a double-precision performance three times higher than "Fermi". At the time, NVIDIA spoke positively about the possibility that the GK110 might see the light in a few months as part of the GeForce program. But it was clear from the start that it would still take some time, because until then not even a Tesla product had been announced.
This changed in November of last year: the Tesla K20 and K20X were officially presented. Based on the technical specifications, you could now imagine how a GeForce card could look like. The only confusion that reigned then was really just about the actual performance in games, since this can be difficult to calculate just on the basis of GPU computing benchmarks. But at least it was already clear at that time that an according GeForce card would feature a maximum of 2688 CUDA cores, 6144 MB of video memory and a 384-bit memory interface. Due to the the different applications of a Tesla and a GeForce card, it was still impossible to draw any conclusions about clock speeds.
Today, finally, the secret will be fully unveiled, so let's head straight to the review of the GeForce GTX Titan and its GK110 GPU.
Architectural Key Data
Let's have a look at the GK110 GPU:
With the GK110, NVIDIA is currently manufacturing one of the largest and most complex chips. It implements 7.1 billion transistors, and even with its manufacturing process of only 28 nm, the chip's size amounts to 521 mm². In terms of size, only the IBM zEC12 at 597 mm² (32 nm) and Intel's Itanium "Poulson" at 544 mm² (32 nm) are comparable to NVIDIA's chip. The GT200 of the GeForce 200 series, with its 576 mm², was no small GPU, either, although at the time, it still was based on the 55 nm process.
The GK110 is unbeaten when it comes to the GPU's number of transistors, even the most modern processors are usually limited to 5 billion (62-core Intel Xeon Phi). But of course the size and number of transistors say little about the performance. Let's have a look at the composition of this complex GPU.
Just as with any GPU based on "Kepler" architecture, it still uses the SMX cluster, home to 192 CUDA cores each. The GPU of the GeForce GTX Titan has 14 of these SMX clusters, therefore featuring a total of 2688 CUDA cores. The block diagram of GK110 shows the PCI Express 3.0 host interface and the six memory controllers (384 bits) on the outer edge. The 14 SMX clusters are grouped into five groups of so-called Graphics Processing Clusters. But what we see here is the block diagram for the full expansion of the GK110 with 15 SMX clusters. The GPU on the GeForce GTX Titan and on the Tesla K20X only uses 14 SMX clusters, though.
For the two Tesla models based on the GK110, NVIDIA reached an enormous gain in double precision performance, and the only moderate increase in single-precision performance compared to the GK104 can be explained by a shift of the FP32-to-FP64 ratio. With the GK110, NVIDIA focussed on the professional market and use of HPC (High Performance Computing) for the first time. Strictly speaking, the first stage of "Kepler", the GK104 GPU on the GeForce GTX 680, was therefore intended for use as a GeForce GPU, in which single-precision performance for rendering plays a crucial role. The ratio of double-precision to single-precision has been reduced from 1/2 to 1/24. Finally, with the GK104, only the graphics memory is ECC protected, the cache is not. Now, NVIDIA is aiming to establish the GK110 as a GeForce product with its special focus on the professional market.
In order to achieve a higher double-precision performance, NVIDIA implemented 64 floating point cores per SMX cluster. In the GK104, there were only eight per cluster. Together with the higher number of clusters, this makes for a huge boost in double-precision performance. NVIDIA also remains true to its scalar architecture and the "Superscalar Dispatch Method", which we first saw in the GF104, and which makes calculations a bit more prone to errors. This leads to a dependancy on Thread Level Parallelism (TLP) and Instruction-Level Parallelism (ILP), and on integer linear optimization.
By default, the double precision CUDA cores only work at 1/8 of clock speed. The NVIDIA Control Panel's 3D Settings finds a menu item called "CUDA - Double Precision". When activated, the double-precision CUDA cores will run at a higher clock speed, but still not at the full 3D clock. This is only interesting for users who require the full double-precision performance for scientific applications. Gamers probably won't make use of this option, since the rendering performance is reduced by the lower clock speed.
Each SMX cluster also features a 64KB L1 cache and a 48KB read-only data cache. Compared to the GK104, the L1 cache was left unchanged; instead, the clusters of the GK110 received a 48 kB read-only data cache. Each SMX cluster still has 16 texture units, so the GK110 has a maximum of 240. Double-precision performance is also enhanced by a change to the registers: the number of registers per SMX clusters remained identical (65,536) to the GK104's, but in the GK110, 255 registers can be accessed per thread - the GK104 only allowed for 63.
What this means for the pure processing power can be explained by the following comparison: the GeForce GTX 680 comes with a single-precision performance of 3.09 TFLOPs. The double-precision performance is even lower, at 128 GFLOPs. The GeForce GTX Titan reaches a single-precision performance of 4.5 TFLOPs. Its double-precision performance amounts to 1.5 TFLOPs.
|NVIDIA GeForce GTX Titan|
|Manufacturing Process||28 nm|
|GPU Clock (Base Clock)||837 MHz|
|GPU Clock (Boost Clock)||876 MHz|
|RAM Clock||1502 MHz|
|RAM Size||6144 MB|
|RAM Interface||384 Bit|
|RAM Transfer Rate||288,4 GB/s|
|Shader Units||2688 (1D)|
|Pixel Filling Rate||40,2 GPixel/s|
The GK110 GPU on the GeForce GTX Titan uses a base clock of 837 MHz on all 2688 CUDA cores. GPU Boost is well-known from the GK104 GPUs, and is also implemented in the GeForce GTX Titan. At least 876 MHz can be reached by the GPU, which according to NVIDIA is a very conservative statement. Often, 1000 MHz and more will be achieved.
As usual, RAM operates at 1502 MHz, and - thanks to its 384-bit interface - is able to achieve a memory bandwidth of 288.4 GB/s. The card has access to a total of 6144 MB of GDDR5 memory. This way, NVIDIA releases the supposed bottleneck to the card compared to AMD cards, which are usually equipped with more memory than the models by NVIDIA.