Thursday, September 8, 2011

ATI Radeon HD 6990 | 4GB | "Faster than the world’s fastest graphics card. We should know, we built that too"



An Islands Euphemism & Family Tree

AMD/ATIs naming scheme has not been rather clear lately. Next to a recent change in the Radeon HD 6000 numbering scheme, you'll have heard codenames flying around as well. Next to that AMD silently eliminated the ATI branding, which in fact has now become AMD.

Let's first dig our teeth in that and explain what is going on.

A bit of confusion out there on the street, is that people call the one released today "Antilles" -- which in fact are two "Cayman" graphics processors slapped onto one graphics card. As you might remember in the past ATI (now AMD) assigned code numbers to the GPU used on these graphics cards AKA RV770 or something. Though we are quite confident that the design team still uses that numbering scheme, AMD marketing however wanted to give it a little more TLC and as such each GPU family has a codename, and each GPU deriving from that family has a codename as well.

The previous generation GPUs were named after Evergreen trees which was the family codename; products deriving from that range known as Cypress, Juniper, Redwood and Cedar.

For the Radeon HD 6000 series generation the products are codenamed after islands in the Caribbean, in this case Northern Islands depending on how you look at geographical location, of course.



When we subdivide the Northern Islands groups we get small segments of islands, each GPU range is named after an island for all new Radeon HD 6000 series products, ready? Here they are... Barts, Cayman, Blackcomb, Antilles and Whistler.

When the Radeon 6850/6870 was released in October 2010, the GPUs empowering them carried codename Barts. Bart is named after Saint Barthélemy island and will be the performance/upper-mid segment GPU series. But that still leaves Cayman, Blackcomb, Antilles and Whistler.

Today's high-end products are based on a GPU called "Cayman" (after the Cayman Islands) which is the high-end product in the AMD Radeon HD 6900 series, and that's a change as well, as previously the 5800 series, was the most high-end.

And while we're still on the island rollercoaster , today we test "Antilles", named after the Antilles Islands of course. Antilles is a group of islands, get it .. group ? as in multiple. As such Antilles is a dual-GPU graphics card that makes use of two Cayman GPUs, positioned in the Radeon HD 6900 series of product, carrying the consumer name Radeon HD 6990.

So in a nutshell, Antilles is based on two Cayman GPUs = Radeon HD 6990. Let's move onwards to the actual product. First a word or two on the Cayman GPU.

An Architecture Change

Last year's released cards in the 6800 "Barts" series were the 6850 and 6870. These cards merely received a small architectural optimization/tweak over the last generation architecture, Cypress. With the Antilles/Cayman products, things have changed a little bit as the fundamental section of the GPU, the Shader processor setup underwent a significant change, and we are still debating whether or not it was a good one.

AMD moved from a VLIW5 (also knows as VEC5) towards a VLIW4 SIMD shader processors setup. We are not going to discuss the VLIW4 thread processor setup in much detail but basically what this means is that AMD went from a VLIW5 configuration, that used four simple SIMD units and one complex t-unit (transcendental unit) in order to build a stream processing unit, to a VLIW4 configuration that uses four stream units which feature equal capabilities, two of them being assigned with special functions.

AMD however claims this change will bring them 10% more performance over the previous thread processor setup, better scheduling and register management. We think it was merely a design change to save on the number of transistors which you can re-use to add more shader processors on the processor die.

Next to this rather significant change, there are more changes to be found on the graphics card. It has upgraded render back ends (ROPS) with a redesigned Z-Stencil and ROP unit architecture consisting of 128 Z/Stencil ROPs, and 32 color ROPs, up to 2 times faster in 16-bit integer operations and two to four times faster in 32-bit floating point operations which will have you in AA performance, much faster GDDR5 memory, and we also spot a series of improved compute features that will help out in performance in that segment.

One other detail that you might find interesting is that when you look at the block diagram, you'll notice that the GPU pretty much looks like a dual-core processor. AMD calls this dual graphics engines. Anyway, have a peek at the block diagrams if at all interested.





Alright, some more generic information to grasp. Each Cayman GPU itself is based on a 40nm fabrication process and harbors a blistering 2.64 Billion transistors. The graphics engine can have up-to 24 shader clusters, with each engine holding 64 shader processors. Do the reverse math and you'll quickly learn that the most high-end GPU will count 1536 shader processors. A bit of an unusual number and we just wonder if there isn't more to be found of them inside that die really.

The Cayman chip has up-to 96 Texture Units and can produce 2.7 TFLOPs of single precision performance.

Memory wise AMD of course stuck to it's fine working GDDR5 setup, and yes it is still based on a 256-bit memory bus.

So what about the R6990 ?

So with this knowledge in mind we can now look into the specifications or the R6990 a little more in-depth,
The Specifications

So then, it is based on two Cayman GPUs, now we can describe the product very easily. Take a Radeon 6970 and multiply everything by two. And that is roughly the R6990 in a nutshell, features and specification wise, yet now it's one card.

Now before I show you the final specifications of the Radeon HD 6990, you need to understand that the card has two modes, normal and unlocked.

The AMD Radeon HD 6990 graphics card features dual-BIOS capabilities. This feature is controlled by an “Unlocking Switch” sitting closely next to the CrossFireX connector on your board. The switch toggles between the factory-supported Performance BIOS of 375W TDP (BIOS1), and a more extreme Performance BIOS (BIOS2) that unlocks higher clock speeds and up to 450W TDP of performance.

For end users to enable this higher performance BIOS, they will have to remove a label covering the dual BIOS switch and set the BIOS switch to the desired position outlined below:
  • Position 1 — 450W Extreme Performance BIOS (BIOS2). 
  • Position 2 — (default shipping position) — 375W factory-supported Performance BIOS (BIOS1). 

Let's place that in a table:



The Radeon HD 6990 comes armed with an astonishing 3072 shader processors, thus 48 SIMD based shader clusters, split up in a twofold engine per GPU. The domain and shader clock is locked in at 830 or 880 MHz. The card comes paired with 4 GB of memory clocked at (effective) 5000 MHz (2GB per GPU). The TDP of this product is 350W in default mode, and in unlocked mode the card can consume 415W with a hefty game. There's room left for overclocking in the unlocked design though, you may take the card up-to roughly 450W.

Now, since we have all this knowledge let's just compare this product with some others. Let's have a quick comparative overview of some of the specifications representing a certain scope of other performance parts, you'll notice that the differences are just HUGE, I've inserted the 'default' performance mode in there, not the unlocked one :


Impressive stuff huh ? The card of course is an up-to-date DX11 class product with a couple of new features. Features wise, the graphics cards will be very similar to the last generation products and is merely an advanced, updated model. However some features like one DVI and four DisplayPort 1.2 monitor connectors are present for a full Eyefinity experience up-to five cards with just one R6990.

Now before you get concerned about the four DP connectors, the card will ship with 3 adapters to enable Eyefinity gaming out of the box for all users. The product will ship with:
1x mini DisplayPort to passive single-link DVI adapter
1x mini DisplayPort to active single-link DVI adapter
1x mini DisplayPort to passive HDMI adapter

The adapter configuration will enable 3x1 gaming out of the box with DVI panels but with additional display adapters or Display Port displays you will be able to drive up to 5 displays in portrait Eyefinity (5x1 Portrait mode) for the rather grand gaming experience.

Power Management - AMD PowerTune

AMD PowerTune is a new technology that opens up a new direction for maximum performance versus TDP. Pretty much AMD can now limit the maximum TDP applied to a card.

AMD can regulate the TDP with the help of active monitoring. Basically you can lower TDP, have it at normal, or increase the thermal headroom of the graphics cards though the Catalyst drivers. The new feature allows the GPU to be designed with higher engine clock speeds which can be applied on the broad set of applications that have thermal headroom.

So at default it will try and keep your power consumption and TDP at a pre-defined baseline, say 200 Watt. But you'll also have a margin to increase that TDP to say +20%, or vice versa, when you want to limit your power draw you can lower the power usage by -20% (or anything in-between).
  • AMD PowerTune can enable higher performance that is optimized to the thermal limits of the GPU by dynamically adjusting the engine clock during runtime based on an internally calculated GPU power assessment. 
  • AMD PowerTune technology also deals with applications that would otherwise exceed the GPU’s TDP like OCCT, Furmark or 3DMark's perlin noise tests. It does so by dynamically managing the engine clock speeds based on calculations which determine the proximity of the GPU to its TDP limit. 
  • AMD PowerTune allows for the GPU to run within its TDP budget at higher nominal clock speeds than otherwise possible. 

 

So a monitoring function on the graphics card can be used to downclock your card when needed. And the flipside of the coin is that it can be directly adjusted by the user using the AMD Catalyst Control Center and used for tweaking and overclocking as you can allow for a more aggressive power containment (and therefore more aggressively limit power and heat) or be used by enthusiasts to relax the enforcement of factory thermal constraints on their AMD Radeon HD 6900 Series Radeon graphics card and squeeze every last bit of performance - at the cost of a much higher TDP of course.

Mind you that if you use ATI Overdrive for tweaking the R6990, you'll need to tweak and apply the setting for each GPU independently.

What's That New Switch?

If you look at the photo below you'll notice a tiny micro-switch next to the Crossfire connector. The R6990 cards have one firmware flashable BIOS and one (non flashable) default BIOS, with the switch you can select BIOS 1 or 2.

AMD initially implemented the feature likely to prevent the RMA rate. They know very well that the enthusiast community often re-flashes their cards, often unsuccessfully after which they enter a very expensive RMA procedure at AMD's cost.

With the R6990 AMD decided to expand that function a little as you'll get control over clock-frequencies and TDP. So the dual-BIOS feature now became an “Unlocking Switch”. The switch toggles between the factory-supported Performance BIOS of 375W TDP (830 MHz - BIOS1), and an Extreme Performance BIOS (880 MHz - BIOS2) which unlocks higher clock speeds and up to 450W TDP of performance. it also applies a slightly higher voltage.
Position 1 — 450W Extreme Performance BIOS (BIOS2).
Position 2 — (default shipping position) — 375W factory-supported Performance BIOS (BIOS1).

It's definitely not a bad thing to have on any graphics card really and we certainly appreciate the implementation.



Universal Video Decoder 3.0

UVD, short for Universal Video Decoder, synonym to the video processors embedded into the GPU of the graphics card. With proper 3rd party software like WinDVD or PowerDVD or the free Media Player Classic you can enable support for UVD which provides hardware acceleration for media content like MPEG2, H.264 and VC-1 high definition video formats used by Blu-ray.

In short, this feature allows the video processor in the GPU to apply hardware acceleration and video processing functions while keeping power consumption & CPU utilization low on your movies and video's.

That means a low CPU utilization whilst scoring maximum image quality. Over the years this engine has advanced and it's really not massively different opposed to the older UVD engines but we do see some new tweaks. Dual stream decoding was already introduced in UVD2 So example, if you playback a Blu-ray movie and simultaneously want to see a director's commentary (guided by video) you can now look at both the movie and in a smaller screen see the additional content (like picture-in-picture). Obviously this is Blu-ray 2.0 compatibility here, and the additional content is an actual feature of the movie. But definitely fun to see.

UVD 3.0 allows for
  • Hardware acceleration decode of two 1080P HD streams 
  • Compatible with Windows Aero mode - playback of HD videos while Aero remains enabled 
  • Video gamma - independent gamma control from Windows desktop. 
  • Brighter whites - Blue Stretch processing increases the blue value of white colors for bright videos 
  • Dynamic Video Range - Controls levels of black and white during playback 

Dynamic Contrast Enhancement will improve the contrast ratios in videos in real-time, on the fly. It's a bit of a trivial thing to do, as there are certain situations where you do not want your contrast increased.

Another feature is Dynamic Color Enhancement. It's pretty much a color tone enhancement feature and will slightly enforce a color correction where it's needed. We'll show you that in a bit as I quite like this feature; it makes certain aspects of a movie a little more vivid. New in UVD3 is now managing entropy and bit stream support for MPEG2 and MPEG4 DiVX/xVID movies, and also there is of course hardware support for Blu-Ray 3D's multi-view codec. Have a peek at the above block diagram demonstrating that.

To be able to playback high-def content you'll still need software like WinDVD or PowerDVD, a HD source (Blu-ray player) and a HDCP capable monitor or television.

For those interested in MKV / x.264 GPU based content acceleration, playback and image quality enhancements, please read this guide we have written. We spotted this lovely little free application to manage this.

HQV 2

HQV2.0 is a collection of challenging video clips, each of which presents a difficult video playback scenario and then asks the reviewer to evaluate the success with which the video processor under test copes with, or corrects, a particular sort of video artifact which might appear in the scene.

Developed by IDT to showcase the features of their consumer and commercial HQV video processing hardware (typically integrated into consumer and commercial HDTV displays, Blu-ray Players, A/V Receivers, projectors and video processing boxes), HQV2 is also useful in evaluating video playback quality on various other video playback devices, including desktop and notebook computer systems.


The table below displays HQV2 test results based on Windows 7 64bit. The images were evaluated on a 48" Philips HDTV connected via HDMI, at 1920x1080p resolution. CyberlinkPower DVD 10 was used for all testing.

Bitstreaming audio
Directly tied to the UVD3 engine is obviously also sound. AMD's Radeon series 3000, 4000, 5000 and now 6000 cards can pass lossless sound directly through the HDMI connector (with the help of the adapter). This has been upgraded as it's now possible to have 7.1 channel lossless sound 192kHz / 24-bit. The HDMI audio output follows HDMI standard 1.4a and supports Dolby True HD and DTS-HD audio. Obviously there is also support for standard PCM, AC-3 and DTS. HDMI 1.4a allows bitrates up to 65Mbps and 3DTV.

So with an AMD Radeon HD 6800 and later series video card, all you need to do is install the card into your motherboard and connect it to your receiver with an HDMI cable as the card removes the need for a separate sound card.

Requirements
  • Playback software , say CyberLink’s PowerDVD 9 or newer 
  • AV receiver that supports Blu-ray player support Dolby TrueHD / DTS-HD Master Audio (HDMI v1.3 compliant) 
  • Two HDMI cables (male to male connectors, rated at 225MHz or higher) 
  • Appropriate speaker cables for your surround sound speaker system

Monitor connectivity - Eyefinity

You guys will notice that the new 6990 series cards have a plethora of monitor connectors. Quite a bit has changed. The reference design cards will carry four mini Displayport (v1.2) connectors, and one dual-link DVI connector .

Display ports is now up-to snuff at revision v1.2 and that allows for a lot of signal bandwidth. Of course you can configure Eyefinity as you please, multiple monitors over multiple DP connector.

For more compatibility AMD endorses all their partners to include several adapters as the card will ship with 3 adapters to enable Eyefinity gaming out of the box for all users. The product will ship with:

  • 1x mini DisplayPort to passive single-link DVI adapter
  • 1x mini DisplayPort to active single-link DVI adapter
  • 1x mini DisplayPort to passive HDMI adapter


The adapter configuration will enable 3x1 gaming out of the box with DVI panels but with additional display adapters or DisplayPort displays you will be able to drive up to 5 displays in portrait Eyefinity (5x1 Portrait mode) for the rather grand gaming experience.

You'll get a variety of options for multi-monitor solutions setup in Infinity mode. The R6990 can drive up-to 5 monitors per card.

ATI's Series 6990 graphics cards will be able to drive one to five monitors per graphics card depending on the limitations we just mentioned in the previous chapter about monitor connectivity.

We've tested eyefinity live in action, and Eyefinity works really nicely. You can combine monitors and get your groove on up-to say 7680x3200 pixels separated over several monitors -- multiple monitors to be used as a single display.

So some examples of what you can do here:

  • Single monitor setup at 2560x1600
  • Dual monitor setup at 2560x1600 per monitor
  • Three monitors setup at 2560x1600 per monitor
  • Six monitors setup at 1920x1080 per monitor


Eyefinity is a really nice feature, and sure we also understand that 99% of you guys will never use more than two monitors. That other 1% definitely matches the Guru3D audience. Personally I like to game on three screens. It's really immersive. If you are bold enough to go for a multi-monitor setup, it really is ideal to get three screens for flight sims, racing games, role playing games, real-time strategy (Huge maps!), first-person shooters and sure, even multimedia apps.

We have two reviews available on Eyefinity:
Radeon HD 5870 Eyefinity (three monitors) review - click here.
Radeon HD 5870 Eyefinity6 (six monitors) review - click here.

Eyefinity is modular and thus allows users to rearrange the number of discrete images created in addition to their shape according to your liking.


Video acceleration post-processed by your GPU

The x.264 format is often synonym with Matroska MKV, a media file container which often embeds that x.264 content, a much admired container format for media files. Especially the 1920x1080P movies often have some form of h.264 encoding dropped within the x.264 format. As a result, you'll need a very beefy PC with powerful processor to be able to playback such movies, error free without frames dropping and nasty stutters as PowerDVD or other PureVideo HD supporting software by itself will not support it.

Any popular file-format (XVID/DIVX/MPEG2/MPEG4/h.264/MKV/VC1/AVC) movie can be played on this little piece of software, without the need to install codecs and filters, and where it can, it will DXVA enable the playback. DXVA is short for Direct X Video Acceleration, and as you can tell from those four words alone, it'll try wherever it can to accelerate content over the GPU, offloading the CPU. Which is what we are after.

There's more to this software though:
  • A much missed feature with NVIDIA's PureVideo and ATI's UVD is the lack of a very simple function, yet massively important, pixel (image) sharpening. 
If you watch a movie on a regular monitor, Purevideo playback is brilliant. But if you display the movie on a larger HD TV, you'll quickly wish you could enable little extra's like sharpening. I remember GeForce series 7 having this native supported from within the Forceware drivers. After GeForce series 8 was released, that feature was stripped away, and to date it has to be the most missed HTPC feature ever.

Media Player Classic has yet another advantage, as not only it tries to enable DXVA where possible through the video processor, it also can utilize the shader processors of your graphics cards and use it to post-process content. A lot of shaders (small pieces of pixel shader code) can be executed within the GPU to enhance the image quality. MCP has this feature built in, you can even select several shaders like image sharpening, de-interlacing, combine them and thus run multiple shaders (enhancement) simultaneously. Fantastic features for high quality content playback.

Here you see MPC HT edition accelerating an x.264 version of a movie @ 1080P. Mind you that the one spike in CPU cycles is me starting up the actual capture software.

The Radeon 6000 series will completely accelerate (DXVA) this movie without any issues. Complex Image sharpening is handled by the shaders processors and we have PC 0-255 Color profile activated over the shaders as well to get nicer black levels. Even if we expand this window to a resolution of 2560x1600 the CPU load will remain low and the graphics card manages that resolution fine.


AMD Accelerated Parallel Processing (APP) Technology

In the current day and age there is more to graphics cards than just playing games. More and more non-gaming related features can and are being offloaded to the GPU. ATI at first introduced ATI Stream, this is now renamed to AMD Accelerate. This is a software layer that allows software developers to 'speak' with the GPU and have it process data using your graphics card. This really is the most simple & basic description I can give it.

Currently simply follows and believed strongly in open standards as OpenCL or for the easiest path to add compute capabilities, Microsoft's DirectX 11 DirectCompute. OpenCL is what AMD believes in the most and allows any developer to use code that scales well on both CPUs and GPUs.

To make things a little more clear for the end user, AMD Accelerated will and is used in software like Cyberlink MediaShow and power director, ArcSoft MediaConverter 4, SimHD (upscaling, H.264 encoding), Total media Theatre (HW accelerated MPEG4/MVC ), Roxio Creator 2010, Adobe Photoshop CS4 and so on... where the GPU assists the software in certain functions, offloading the processor.

Of course among it also falls... folding...