Nvidia's New Kepler Architecture


New architectures don't come around quite as frequently for video cards as they do for processors, but they can have almost as big an impact. That's certainly what Nvidia is trying to prove with its new Kepler architecture. Nvidia promises it will deliver impressive gains in terms of power and performance for desktops and laptops alike, and you should expect to start seeing it appear in both stay-at-home and out-and-about computers starting soon. We've already reviewed the first desktop discrete card based on the Kepler architecture, the GeForce GTX 680,  so if you're interested in where Nvidia is planning on taking PC graphics processing, we've already got you covered.Still, there's a ton of information out there about Kepler itself—its genesis, its features, its capabilities—to sift through. So we've prepared this brief rundown of six important things to know about what Kepler is, how it works, and what it will mean for PC graphics in the upcoming year or two.

1. "Kepler" is "Fermi" evolved. 
Nvidia heralded its "Fermi" architecture, released in 2010 on its GTX 480 video card, as a major advance in parallel processing. It was based on a collection of four Graphics Processing Clusters (or GPCs), each of which contained a raster engine and four Streaming Multiprocessor (or SM) units. Each SM, in turn, contained 32 CUDA processing cores, 16 texture units, and a polymorph engine. The GTX 680's GPCs use a similar design, but with a couple of key differences. Each SM is now a "next-generation Streaming Multiprocessor," which Nvidia abbreviates as SMX; each SMX contains 192 CUDA cores, for a total of 1,536 cores in the entire Kepler GPU—which suggests potential for considerably greater performance; and the polymorph engines have been redesigned to deliver twice of the performance of those used in Fermi, for what Nvidia calls "a significant improvement in tessellation workloads." But because all those CUDA cores also run at a lower clock speed than Fermi's did, the GPU as a whole uses less power even as it delivers more performance. (We've verified this in our own testing, by the way.) This could prove to be especially good news for laptop owners, as those power savings can easily translate to longer battery life.

2. Memory has been rethought. 
There are changes to the memory system as well as the processing structure. An L2 cache of 512KB is shared across the GPU to provide extra buffer space for the chip's various units; its cache hit bandwidth and atomic operation have both been increased to provide additional support for all those more powerful CUDA cores. A Kepler GPU also contains four 64-bit memory controllers, operating at an overall data rate of 6,008MHz—a significant improvement over the 3,696MHz of the GTX 480, which was loaded with six 64-bit controllers. Nvidia boasts about achieving the 6,008MHz speed by way of a new I/O system based on improvements in circuit and physical design, link training, and signal integrity. On the GTX 680, which is equipped with 2GB of GDDR5 memory, this all adds up to a total memory bandwidth of 192.26GBps, again more than the 177.4GBps seen on the GTX 480.

3. It's speedier than it looks.
Nvidia has instituted for Kepler-based hardware a new technology called GPU Boost. Similar to Intel's Turbo Boost and AMD's Turbo Core, GPU Boost ensures that the video card's clock speed is, in fact, a very fluid thing. For example, the GTX 680 has a base clock running at 1,006MHz. But if the card is operating below its TDP, meaning it's using less power than it's capable of using (because it's running a not-too-demanding 3D game, for example), it can dynamically increase its clock speed until the gap is filled. It's tough to say at this point how much of a performance improvement you can expect in any given title, though we have an idea of the range. The average upped clock speed is 1,058MHz, but Kepler GPUs are capable of going even higher than that. This is before overclocking is figured in, by the way—that remains an option for gaining even more speed. (GPU Boost is currently only slated to be available in desktop products; laptop users are out of luck, at least for now.)

4. Video has been re envisioned.
Nvidia has implemented a new display engine on its Kepler GPUs that enable some useful features. Whereas previous Nvidia cards were limited to powering two monitors, you can now drive four at a time with a single card like the GTX 680—nice if you want to have three displays for a 3D Vision Surround setup and one for actual work. On Kepler GPUs you'll also now find a hardware-based H.264 video encoder called NVENC. On previous Nvidia cards encoding was handled by the CUDA cores, and their use increased power consumption; a hardware solution consumes much less power and, according to Nvidia, encodes video almost four times faster. Nvidia claims that NVENC can encode 1080p videos eight times faster than real time, and can encode up to resolutions of 4,096 by 4,096. NVENC supports H.264 Base, Main, and High Profile Level 4.1 (the same as the Blu-ray standard), and the H.264 extension Multiview Video Coding (MVC) for stereoscopic video for use with Blu-ray 3D.

5. Anti-aliasing goes pro.
Kepler-based GPUs push anti-aliasing beyond the common styles today (Multisample Anti-Aliasing, or MSAA, is the big one, though Nvidia also uses Coverage Sample Anti-Aliasing and AMD Morphological Anti-Aliasing) with a new flavor called Temporal Anti-Aliasing (or TXAA for short). According to Nvidia, TXAA is a new "film-style" technique that blends MSAA with Fast Approximate Anti-Aliasing (FXAA), where it analyzes all the pixels on the screen and smooths the ones it detects create an artificial edge. Because no game yet supports TXAA, we weren't able to test this, but Nvidia says it delivers the kind of image quality you'd get from 8x MSAA with the accompanying performance hit of only 2x MSAA.

6. Less focus on compute.
All of Nvidia's changes have resulted in what is, overall, the fastest and the most electricity-bill-friendly single-GPU gaming video card we've yet seen. But this title hasn't come without one sacrifice: compute. Fermi GPUs were sold, at least partially, on their ability to perform mathematical calculations à la CPUs, and displayed impressive facility doing just that, but Nvidia stripped some of those abilities away in order to improve power efficiency. Using LuxMark 2.0, an application designed for testing OpenCL compute performance, we compared last generation's GeForce GTX 580 (based on an updated Fermi-style GPU) with the GTX 680, and the earlier card came out ahead in every test—and AMD's new cards, like the Radeon HD 7970, did even better. If you want a card that's every bit as good for work as play, Kepler-based GPUs may not be the way to go. But the GTX 680 is the runaway champs for playing 3D games on your PC.