• A closer look at the anisotropic filtering of the Radeon HD 5000 series

    Radeon HD 5000 Anisotropic Filtering examined 16x AFWhen AMD launched the HD5000 series, also known by their collective codename Evergreen, they were proudly directing attention to the improved anisotropic filtering, more specifically two things:
    Texture attribute interpolation was moved from the TMUs into the shader core and the angle dependency had almost vanished. The approriate screenshot from a little application known as AF-Tester was shown and repeated through most of the reviews in the internet world. Even though AMDs offerings of late exhibit a more pronounced tendency towards shimmering in fine grainded texture detail, some reviews and reputable magazines declared them ruling kings of image quality - something the fanboys from the red camp really liked and re-trumpeted around the forae of the internet.

    Now, there's a small caveat, even if you chose to ignore the texture shimmering issue: Sometimes, and it isn't clear when and why this happens, the Evergreen chips show an abrupt mipmap transition where there should a smooth blending from one mip level into the other. Recent Geforce cards do that, and also AMDs own HD4000 series, which were seemingly inferior to the 5k series.

    In order to illustrate the point, heres a small 720p-clip taken from Half-Life 2 with my trusty HD 5870, then another one with my HD 4870. (There are some more videos from this scene in my Youtube channel, also with Geforce cards and optionally supersampling anti aliasing applied)



    The texture shimmering in this case is clearly due to bad art design on Valve's part, but please direct your attention towards the end of the tunnel - you might see a harsh transition in the level of texture detail in one of the clips, whereas it appears much smoother in the other one. That's the issue I want to shed light on today.

    A little (oversimplified) background on texture filtering
    Texture filtering removes those blocky artefacts older ones might remember from the glorious days of DOS and the first 3D games. This is achieved by averaging four square-aligned points inside the texture and is usually handled by a fixed-function block within the graphics processors Texture Mapping (or Management or Memory, if you prefer) Unit, shortened as TMU. Modern TMUs are capable of fetching the necessary values and averaging them almost instantaneously, achieving a throughput of one filtered texel per clock. In order to save on transistors and to improve on the locality of data, they are organized in Quads or even Octs, putting out four or even eight textured pixels at a time. To further reduce the demands on the memory subsystem, which in the end has to deliver all texture values to the TMUs, caches have been implemented, where already touched values reside to be reused for neighboring pixels. This cache, called the texture cache, is one of the areas, where AMD had to cut back in the Cypress chip compared to its predecessor RV770. Each Quad-TMU retains only half as much cache as its counterpart in RV770, which had 16 kiByte available.

    Mip-mapping is a technique where textures are halved in resolution on each axis in several steps. If you start with a pretty detailed base texture at 1024x1024, the first Mip-level is scaled to 512x512, the second to 256x526 and so on. Each time the resolution is reduced, the level of detail also suffers significantly. This is commonly used in order to eliminate texture shimmering in the depths of a given scene, where multiple values in texture space content for fewer values in world space and to improve the hit-rate on texture caches, giving better performance. The borders between each Mip-map level are in a fixed distance to the users viewpoint. If the user is virtually moving along a sidewalk, this harsh transition of texture detail, also known as banding, is moving at ever the same distance in front of you - much like the bow wash of a moving boat.

    Trilinear filtering now is used to smooth out those transisitions by taking the values from two neighboring mip-levels and blending them evenly together. This is quite expensive, both for the memory subsystem and also for most modern texture units - they need for each pixel that is to be filtered trilinearly another clock cycle, basically doubling the amount of time spent on filtering compared to basic bilinear filtering. Hardware Vendors, both Nvidia and AMD, have a long standing history of trying to reduce the amount of work done in this area and thus improving framerate.

    But in 3D graphics it's the same as anywhere else: There is no free lunch.

    So, cutting corners on texture filtering is a process of leaving out texture operations without the user being able to notice it. Nvidia has failed gloriously at this during the Geforce FX era until Geforce 7, where texture operations also stalled the execution units inside their chips, whereas AMD had optinoal near-perfect filtering quality on their competing X1000-series of cards, after having silently introduced texturing economizations of their own with RV360, which, after it was being discovered, was integrated into an optimization package called Catalyst AI. With DirectX 10, the situation has kind of reversed: Nvidia delivers better default and near optimal quality in most cases if you can be bothered switching the driver into the optional High Quality mode. AMD on the other hand strived for more performance with their smaller chips (compared to Nvidias) and optimized the texture filtering more and more until the point where even the mighty HD 5870, sporting no less than 80 TMUs running at 850 MHz, delivers a quality, that oftentimes is not as good as it could be while disabling Catalyst AI does nothing to lessen this load of optimizations - it has no effect on my HD 5870, but surprisingly it works at a borrowed HD 5700. This decision by AMD is not something I can say to even remotely understand.

    Enter the mighty AF-Tester and a not-so mighty HD 5770

    First, I'd like to compare a few composited images, where the Catalyst AI feature is being left at it's default state in the left half of the picture while being disabled in the right half of the images.

    Radeon HD 5000 Anisotropic Filtering examined 1xAFRadeon HD 5000 Anisotropic Filtering examined 2xAFRadeon HD 5000 Anisotropic Filtering examined 4xAFRadeon HD 5000 Anisotropic Filtering examined 8xAFRadeon HD 5000 Anisotropic Filtering examined 16x AF

    To save a little bandwidth and since the AF testers' view into a 200-faced tunnel is highly repetitive for each of the four quadrants I decided to only share the upper halves with you.

    While there's nothing really breathtaking to behold in the first two shots (with 1x AF and 2x AF respectively), from 4xAF (the middle picture) on, the so called bi-band, the full colored strip of red, green, blue, purple and so on becomes a little larger with Catalyst AI enabled. This means, that for every additional full colored pixel the chip saved itself some filtering work by eliminating the need for a second cycle doing trilinear filtering. Jumping to the last picture, we see that the red colored area in the left half of the picture has not moved as far away as in the right half - the chip is using the first mip-level earlier, having only to look the texels up in a (four times) smaller texture and thus improving no its caches hitrates.

    But that's not the issue, rather to show that Catalyst AI is working on HD 5770 and disabling it has some effect at least. Because for the following shots, AI has already been turned off (but it's changing nothing about the issue at hand, unfortunately). I'll leave you alone with a huge arrary of pictures. They all show the same tunnel as above but this time with no coloring of the Mipmaps. Please bear with me - I'll try and explain a little more further down.

    From left to right: 2x, 4x, 8x and 16x AF with the default texture-setting of 8 - nothing spectacular to behold, please move on.
    Radeon HD 5000 Anisotropic Filtering examined 2x AFRadeon HD 5000 Anisotropic Filtering examined 4x AFRadeon HD 5000 Anisotropic Filtering examined 8x AFRadeon HD 5000 Anisotropic Filtering examined 16x AF

    From left to right: 2x, 4x, 8x and 16x AF with the texture-setting moved to 7, resulting in a twice as detailed checkerboard - you might already see something... or is it your imagination only?


    From left to right: 2x, 4x, 8x and 16x AF with the texture-setting moved to 6, resulting in a twice as detailed checkerboard - still, no problems, or are there?


    From left to right: 2x, 4x, 8x and 16x AF with the texture-setting moved to 5.



    From left to right: 2x, 4x, 8x and 16x AF with the texture-setting moved to 4.



    From left to right: 2x, 4x, 8x and 16x AF with the texture-setting moved to 3 - now you really should start to see something if you look very hard (right side, middle of the picture)


    From left to right: 2x, 4x, 8x and 16x AF with the texture-setting moved to 2. Now the texture is large enough to wreak havoc with the 5770s aniso filtering: Instead of fine grained detail, you only see a grey disc at the center with 8x and 16xAF enabled.



    From left to right: 2x, 4x, 8x and 16x AF with the texture-setting moved to 1. „Discworld” has spread to 4x AF alright. Possible conclusion: The larger the texture the smaller the AF levels affected.


    From left to right: 1x, 2x, 4x, 8x and 16x AF with the texture-setting moved to 0 - the most detailed checkerboard available in this test.


    The most detailed texture in the test results in an almost uniform grey disc with even 2xAF applied, only if you switch off anisotropic filtering completely as in the leftmost picture you still get fine detail from the texture. The same thing happens, if I use an HD 5870 and/or older drivers (10.7 for example). It however does not occur when a HD 4870 renders the scene (or a Geforce GTX 280 for that matter).

    The problem with HD 5000s anisotropic filtering does not only manifest itself under those very theoretical circumstances, but also, as my videos show, under a very real gaming environment or, as some video-enhanced postings in the german 3DCenter forums illustrate, in other titles

    There has been speculation over at my friends at the Forums of Beyond3D.com (started with the alienbabeltech.com-review), whether or not this might be a cheat, a driver bug or a hardware limitation as an overflowing texture cache. From what I've seen and how the problem is spreading to lesser degrees of anisotropy, I find it most likely that the explanation blaming the texture cache in some way is the correct interpretation of what we are seeing here - the grey disc of the lost detail.

    I have scheduled this as a bug in AMDs Catalyst Driver Feedback programme, and so should you, if you (like me) want things like this fixed before shipment of final products.