• Anisotropic Texture Filter: An extreme case of optimization

    Filtered textures have been essential in the breakthrough of consumer-level 3D graphics acceleration 1.5 decades ago. With bilinear filtering applied (in addition to Mip-Mapping), Voodoo Graphics delivered smooth images of legendary 3D games such as Quake, Tomb Raider or Diablo. But despite they say „things change”, bilinear filtering still is the preferred method current hardware employs as a baseline. Now, trilinear filtering starting to become popular at about 1998, added in a second cycle smoothes out the transitions between one Mip-Map-level and another, while anisotropic filtering, which raised to fame around the year 2001, uses filter kernels with different dimension for each axis – hence the name: An meaning „not” and isotropic meaning „equal in all directions”.

    Plus ça change, plus c'est la même chose
    One thing was true back in 1995 as in all the years to come until this very day (and probably will be for a very long time): Better quality images are not free. Bandwidth and math, texture filtering uses a lot of ressources. The math is easy, so easy actually, that very specific circuits have been created to perform the actual filtering, working double and triple (and more) shifts until the desired quality level is reached. They need to be fed with enough data though and here comes the second ressource, texture filtering relies on, into play: Bandwidth. Some of the texure units hunger for data can be served by providing a good amount of texture cache. That avoids re-fetching the same data multiple times.

    There have been countless attempts at improving the texture filters performance over the years some, or should I say many, of which actually involve a reduction of image quality delivered. In the past, both Ati and Nvidia were head over heels into the race of who could save the most cycles spent on texture filtering. There has been and still is endless debate around texture filters as well as a whole host of articles critizing cheats and hidden optimization, like here at my site, here, here, here, here, here, here, here, here, here, here, and here (do you also wonder why almost all of those links are to german websites? Somewhere, I have heard the term „Filter Nazis” ) I think it's moot to start pointing fingers at either Nvidia or AMD nee Ati, no one is innocent.

    Performance Optimizations – NOT
    In this article, I don't want to discuss the image quality but rather focus on the more or less substantial fps gains enabled through the so called optimizations of texture filtering. You might have noticed, that I didn't use the word „performance”, because that involves work and time. If I reduce work by 50 percent and save also 50 percent of the time, I did not improve performance, but only lowered the workload.

    There are many ways to reduce workload: You can save on the width of the Mip-Map-transitions being filtered trilinear (sometimes also called trylinear or brilinear, though I prefer reduced-trilinear), you can save samples in your anisotropic filter, you can save on the time to compute which filter kernels to select and the list goes on. Depending on the task at hand, there is a chance that image quality will be degraded, regardless of what Nvidia and AMD might say.

    In order to measure, how much texture filtering work is skipped, I decided to use PowerVRs Villagemark in its Direct3D version 1.22. I have run the benchmark in realtime ( -Benchmark=2), meaning full two minutes of slowly gliding through the village with high overdraw, high texture details and three texture layers. Since it actually is CPU limited in lesser resolution, I've decided to only give the numbers for the highest possible resolution of 2048x1536 (2560x1600 doesn't work). Here, the fastest cards went down from 4500ish fps to about a quarter of that even without anisotropic filtering applied, indicating a not very prominent CPU-limitedness. The benchmarks were run under the Windows XP SP3 operating system, as the Catalyst driver seemed to cause the application to quit with an error in Windows 7 – Geforce drivers worked there though, but for better comparability I've run the Geforces also in Windows XP.

    Further details about Villagemark itself can be found in the Retro-Forum. To illustrate the test procedure, I have captured a 720p-Video of the full benchmark run of Villagemark D3D v1.22, which you can see directly below.

    How much work is skipped?

    Now for the real meat of this article. How much work is left out by the texture filters of recent Radeon and Geforce graphics cards? Fortunately, with Catalyst 10.10 AMD decided to decouple the texture filter economizations from the other improvements and bug fixes in place - all together called Catalyst AI and brought into life after it was discovered that X800 did „economize” its texture filtering behind everyones back.

    Earlier, I mentioned the lack of effect disabling Catalyst AI had on my HD 5870. Now, according to AMD, this was an intended behaviour, as with earlier Catalyst drivers (10.9 and back) and contrary to the HD 5770 for example, there supposedly weren't any texturing optimizations in place. With Catalyst 10.10, the default level of Catalyst AIs texture economization are called quality and do indeed impose seriously less work done in the texture filter. With 10.10 set to high quality, the fps and fillrate levels are back at 10.9-with-AI-default. So, as far as I can tell, AMD has told the truth - and frankly, why wouldnt they?

    In the table below you'll find the fillrates in Mpix/sec. achieved in Villagemark with and without the economizations in place.

    Texture Filtering vs Villagemark - effect of so called optimizations on fillrate

    For reference, I also included older measurements with HD 5770 and HD 5670, both of which show very similar behaviour with Catalyst 10.8, albeit at some lesser degree compared to current top-of-the-line hardware HD 5870 and HD 6870. Depending on the chip, up to 28 percent of fps and thus fillrate can be achieved when setting the AI texture slider to quality instead of high quality - and in the process leaving some image quality on the table. But that is subject to other articles. What I find interesting is the sudden jump in optimizations when going from 2x to 4xAF on AMD hardware, whereas the higher leves seem somewhat more linearly optimized.

    Also included are Geforce cards with quite recent drivers, showing „only” a 10 - 11 percent increase in fillrate when using driver default instead of the highest possible quality setting and also no large jumps with higher levels of anisotropic filtering applied. Instead, the increase happens at 2x AF already with almost full impact. The lesser degree of fillrate improvement cannot, however, be epxlained with recent Geforce models being unable to reach high fillrates compared to the Radeons. The bottleneck that is raw pixel throughput in Fermi-based Geforce models, seems not to play a role, as the highest performing Geforce, the GTX 480 compares to the highest performing Radeon HD 5870 as their respective texturing rates would suggest, even without anisotropic filtering applied.