• AMD Catalyst Driver 11.7 & AMD APP SDK 2.5 bring double precision for Cypress, AVX for CPUs

    AMD Catalyst 11.7 WHQL Driver SuiteCatalyst Drivers 11.7 WHQL and 11.8 Preview have been readied for download and AMD has moved their APP-SDK for accelerating OpenCL applications to version 2.5, now finally supporting Double Precision/Doubles/FP64/watchamacallit via cl_khr_fp64. The caveat though: It's only included for Cypress-based GPUs, not Cayman and not older chips capable of 64 bit precise calculations. According to AMD, APP 2.5 now also leverages AVX, Advanced Vector Extensions on CPUs supporting it, namely Intel's Sandy Bridge and AMD's upcoming Bulldozer. The first CAP release for 11.7 also includes optimizations for the alpha-version of Battlefield 3. Additionally, AMD has posted a preview driver for Catalyst 11.8 which promises speed improvements in DirectX 11 mode for Crysis 2 and Fear 3, 30 percent faster MLAA and HD3D-support for DP monitors.

    The highlights of this Catalyst 11.7 WHQL release are certainly, that all remaining issues with mouse cursor lag have been resolved, as AMD states on their release notes website (attached as archive to this article) as well as resolving the system hangs with certain displays connected via DP and HDMI - something which a hotfix driver did in the meantime. On the downside though, AMD seems to have problems getting OpenGL to run reliably with Photoshop CS5 and rotating images there. Also, Crossfire seems to cause random corruption in Crysis 2 run in DX9-mode - but who would do that anyway?

    Other changes include mainly video playback issues or functions associated with AMD's Steady Video feature, on which I've already reported:

    • Bluray playback using PowerDVD 10 under High Performance mode no longer randomly displays a blank screen.
    • Bluray playback using PowerDVD 10 under High Performance mode no longer randomly hangs.
    • Some Divx format files no longer display video corruption using WinDVD.
    • AVIVO Video Quality settings are now correctly applied to Flash Video Content.
    • AMD SteadyVideo is now applied to Home Video clips when using WinDVD 10.
    • Random screen corruption is no longer displayed when changing desktop themes.
    • Video playback now works correctly when Hardware acceleration is enabled with VLC Player version 1.1.9.
    • Crossfire now functions correctly when playing Hamilton’s Great Adventure.
    • Video corruption is no longer observed when when using the Forward slider on 720P Divx format video clips using Windows Media Player and Windows Media Center.
    • PowerDVD now correctly handles 3D Bluray content.
    • Image stabilization now functions correctly when playing Divx content using Windows Media Player.
    • Wraparound corruption is no longer displayed intermittently after exiting a 3D application.
    A little bit more interesting this time is AMDs APP-SDK and the associated OpenCL runtime driver. It is already integrated into Catalyst 11.7 WHQL, so you can put it to good use right away. What's new this time is support for Cypress's Double Precision capabilities in OpenCL via cl_khr_fp64. Since that's one thing that kept me from running the OpenCL-Benchmark from HPC-Tech.org for a while, here's the result of the Double-Precision Benchmark on my HD5870:

    HPC-Tech.org's OpenCL-Benchmark - Double OpsHPC-Tech.org's OpenCL-Benchmark - Double Ops As you can see, even though my testing machine does by not provide ideal conditions, the raw throughput numbers for my Radeon HD 5870 speak for themselves. Just a couple of days ago, I lamented about the speed loss of Nvidias brand-new 280.19 driver in OpenCL 1.1 - but the raw throughput was not affected, so I uploaded the results too (note that my GTX 480 came factory overclocked to 750e-1500s-924m) - but it's no match DP-wise for the 5870 because Nvidia choose to cut DP-rate from 1/2 to 1/8 for reasons of product differentiation to the more expensive Tesla and Quadro lines. What I cannot understand though, is why AMD adds support for HD5800 only - leaving customers using the double-precision enabled Cayman GPUs out. Those GPUs were primed to be used as compute monsters after all and should be able to do really well with compute loads.

    AMD Catalyst 11.7 WHQL version numbersAnyway, here are the Catalyst Drivers Suite Components' version numbers for July 2011:
    • Catalyst Version 11.7
    • Open GL ICD 6.14.10.10907
    • Direct3D Driver 7.14.10.0847
    • 2D Driver 8.01.01.1171
    • Catalyst Control Center 2011.0707.2346.40825
    • Packaging 8.872-110707b-122569C-ATI


    The whole list of improvements is posted over at AMD's release notes website (in zipped format attached in case the URL moves). As always you can go over to AMD's official download-site or use the links below - they might work or not though:

    The 11.7 CAP1 is an add-on which normally only updates Crossfire-profiles in order to make new games work with the expensive setups of AMDs premium audience or improve behaviour for games already released. In case of 11.7 CAP1, in addition to the advertised image quality improvements for Hydrophobia Prophecy and Rise of Flight, AMDs Andrew Dodd, also known as CatalystCreator on Twitter, who's in charge of the Catalyst Software Team, posted recently on the mentioned micro-blogging service, that this package would include also optimizations for Crossfire operation for Battlefield 3, which currently is in the closed alpha test and for which performance numbers already leaked.

    CatalystCreator on Twitter: BF3 Alpha already optimized for in 11.7 CAP1


    As it seems to become a tradition with AMD to release previews of future driver developments, they've made available an early version of Catalyst 11.8. The corresponding knowledgebase article lists some pretty cool speed improvements for DirectX 11 games and MLAA, AMD post-processing filter (not only) for games that refuse to work with standard anti-aliasing. Crysis 2 in DX11 mode is supposed to gain up to 10 percent of performance across the whole range of Radeon HD 5000 and 6000 cards; all those cards are also to gain up to 8 percent in Fear 3 run in DirectX 11 mode with application anti-aliasing enabled. If you happen to like MLAA, gains of up to 30 percent (in some cases even more) await! Probably, since at least Crysis 2 also uses some kind of post processing AA, AMD was able to improve the kernel launch and execution time for the corresponding compute shaders or something like that - normally though, you won't see such broad ranges of speed improvements over two whole families of cards. You can download Catalyst 11.8 Preview right here.


    *Even though it's officially only for Windows 7 64 Bit, at least the filesize suggests, that it might be a complete package, including support for Windows 7 32 Bit and the respective Vista versions as well - just like preview drivers did in the past.


    Background and useful links regarding AMD Catalyst drivers


    1. AMD App SDK 2.5

    If you are into open standards as AMD is, you should try installing the AMD App SDK 2.5 formerly known as Ati Stream SDK. OpenCL 1.1 conformance is assured by AMD for basically all DirectX 11 conformant Radeons, meaning from the HD 5400 series up to the HD 6900 series with the exception of HD 6700 series which includes some renamed Juniper-based SKUs as well as the Barts salvage part HD 6790. Also - and quite curiously - the HD 6990, which with it's two-GPU-power of over 5 TFLOPS would be quite a nice computing device, is not supported as of the list posted on August 3rd 2011 but finally, HD 5970 is supported in Dual-GPU mode. Beta support for OpenCL 1.0 is available for Radeon HD 4890, 4870 X2, 4870, 4850 X2, 4850, 4830, 4770, 4670, 4650, 4550, 4350. The general support in the Stream SDK 2.5 is true as well for the corresponding FirePro and Radeon Mobility variants. Since version 2.3 AMD is supporting their Fusion-APUs C- and E-series in OpenCL as well as the Radeon HD 6900 based on the Cayman chip. Starting with App SDK 2.5, double precision calculations are finally supported via cl_khr_fp64 extension - as of August 3rd 2011 for Cypress-based GPUs only.


    The installed applications and tools include the APP profiler in version 2.3 and the Kernel Analyzer, which has just been updated to version 1.9. The release notes can be found as PDF on AMDs site; a blog entry from AMD highlights some more points of this APP-SDK release 2.5:

    • Kernel launch times have been further reduced.
    • The LLVM compiler version used for OpenCL kernels has been upgraded.
      • Includes support for use of SSE3 and SSE4.
      • Added support for partial use of FMA4 and XOP instructions.

    • It is no longer necessary to use the -fno-alias compiler command line option.
    • PCIe transfer overhead has been reduced under Linux.
    • Transfers between CPUs and GPUs are improved for buffers declared with either the CL_MEM_USE_HOST_PTR or the CL_MEM_ALLOC_HOST_PTR flag.
    • For APUs, zero copy buffers created as CL_MEM_ALLOC_HOST_PTR | CL_MEM_READ_ONLY offer improved GPU read performance.
    • The runtime supports multi-GPU, including simultaneous use of the GPU on both and APU and a discrete GPU on systems running under Windows.
    • OpenCL built-in functions leverage AVX on capable CPUs
    • Support for PowerExpress 4.0.
    • Support for atomic counters for discrete GPUs.
    • Support for headless GPU operation.
    • OpenCL can be used by a Windows service.
    • UVD3 / MPEG-2 support.
    • The clFFT library supports radix 3 and radix 5, including support for mixed radix 2/3/5.
    • The BLAS library supports the D/S SYRK, D/S SYR2K, D/S GEMV, D/S SYMV functions.
    • The FP64 extension is supported for the ATI Radeon™ HD 5900 and 5800 series, as well as the AMD FirePro™ V8800 and V8700 series.
    • gDEBugger 6.0 extension is available for Visual Studio.
    • Starting with Catalyst 11.8, improved runtime features appear regularly in the monthly Catalyst releases for Windows.
    • Kernel Analyzer 1.9 supports Catalyst releases 11.4 to 11.7.
    • APP Profiler provides
      • Improved API trace.
      • Improved timeline visualization
      • Support for analyzing OpenCL Application trace.
      • Thread ID and sequence number now are included in the profile output.

    Please note, that as of this driver release, AMD has yet to unlock the use of their 64 kiB on-chip Global Memory for inter-SIMD communication with a separate OpenCL extension in order to unleash the full potential of their GPUs. GPU-Computing programmers will have to take this limitation into account for their work. As you may have noticed, separate versions for Windows XP are no longer offered and the footnote on AMDs developer website that indicated XP support still being included in the 32 and 64 Bit installers respectively has disappeared. That seems to complete the discontinuance for Windows XP support in the APP SDK.Please note that the requirements for Linux versions have change also. For Ubuntu and Redhat you need versions 11.04 and 6.x respectively.
    Overview of OpenCL Capabilities with GPU Caps Viewer


    2. Archived Drivers
    For the bug-ridden of you who might need to roll back to earlier Catalyst versions AMD has support pages up and running with archived Catalyst drivers from version 8.10 (yes, October 2010 that is) untill the most recently relieved ones.

    3. Radeon with AGP interface - Hotfix awaits
    Also, there's the special page for the now inofficial AGP-versions of the drivers. God knows why they're not worthy of a WHQL-sign any more (my guess it's cost-related…).

    4. 10.2 Legacy Driver for Radeon 9500/9700 series to X1950 series
    For all you brave souls hanging onto your trusty Radeon 9x00, X300, X550, X600, X700, X8x0, X1K, X2100 and Radeon Xpress integrated graphics, there's little hope. Sometimes, when the stars' constellations are aligned just right, a new driver for the officially no longer supported graphics chipsets is spawning at the following sites:

    Overview about supported standards and functionality
    And now for the „GPU-Tech.org added value”, I am promising to deliver. Here's the supported standards and tech for most of the recent desktop Radeon cards - something that's not very well documented on the web, not to speak of being crammed into one single place. Over the course of the next few days, I will try and rework this into a more reader friendly form - bear with me, if you see broken formatting inbetween.


    Radeon HD 6990/6970/6950 (Cayman based):
    • DirectX 11 (Compute Shader 5.0, 4.1, 4.0) and lower,
    • OpenGL 4.1
    • OpenGL ES 2.0
    • OpenCL 1.1 (OpenCL Codename Cormorant)
    • AMD Accelerated Parallel Processing, short APP (formerly known as Ati Stream), HD 6990 not yet officially supported
    • Double Precision at 1/4th speed
    • FMA at full speed
    • Triangle-setup at double speed

    Radeon HD 6870/6850/6790 (Barts based):
    • DirectX 11 (Compute Shader 5.0, 4.1, 4.0) and lower,
    • OpenGL 4.1
    • OpenGL ES 2.0
    • OpenCL 1.1 (OpenCL-codename Buzzard)
    • AMD Accelerated Parallel Processing, short APP (formerly known as Ati Stream)
    • No Double Precision Support
    • FMA at 4/5th speed
    • Triangle-setup at full speed
    Radeon HD 6770/6750 (Juniper based, renamed for OEM usage):
    • DirectX 11 (Compute Shader 5.0, 4.1, 4.0) and lower,
    • OpenGL 4.1
    • OpenGL ES 2.0
    • OpenCL 1.1
    • AMD Accelerated Parallel Processing, short APP (formerly known as Ati Stream)
    • No Double Precision Support
    • FMA at 4/5th speed
    • Triangle-setup at full speed
    Radeon HD 6670/6650 (Turks based):
    • DirectX 11 (Compute Shader 5.0, 4.1, 4.0) and lower,
    • OpenGL 4.1
    • OpenGL ES 2.0
    • OpenCL 1.1
    • AMD Accelerated Parallel Processing, short APP (formerly known as Ati Stream)
    • No Double Precision Support
    • FMA at 4/5th speed
    • Triangle-setup at full speed
    Radeon HD 6450 (Caicos based):
    • DirectX 11 (Compute Shader 5.0, 4.1, 4.0) and lower,
    • OpenGL 4.1
    • OpenGL ES 2.0
    • OpenCL 1.1
    • AMD Accelerated Parallel Processing, short APP (formerly known as Ati Stream)
    • No Double Precision Support
    • FMA at 4/5th speed
    • Triangle-setup at 1/4th rate
    Radeon HD 5970 (Cypress based):
    • DirectX 11 (Compute Shader 5.0, 4.1, 4.0) and lower,
    • OpenGL 4.1
    • OpenGL ES 2.0
    • OpenCL 1.1 (single-gpu mode only)
    • AMD Accelerated Parallel Processing, short APP (formerly known as Ati Stream)• Double Precision at 1/5th speed
    • FMA at 4/5th speed
    • Triangle-setup at full speed
    Radeon HD 5870/5850/5830 (Cypress based):
    • DirectX 11 (Compute Shader 5.0, 4.1, 4.0) and lower,
    • OpenGL 4.1
    • OpenGL ES 2.0
    • OpenCL 1.1
    • AMD Accelerated Parallel Processing, short APP (formerly known as Ati Stream)
    • Double Precision at 1/5th speed
    • FMA at 4/5th speed
    • Triangle-setup at full speed
    Radeon HD 5770/5750 (Juniper based):
    • DirectX 11 (Compute Shader 5.0, 4.1, 4.0) and lower,
    • OpenGL 4.1
    • OpenGL ES 2.0
    • OpenCL 1.1
    • AMD Accelerated Parallel Processing, short APP (formerly known as Ati Stream)
    • No Double Precision Support
    • FMA at 4/5th speed
    • Triangle-setup at full speed
    Radeon HD 5670/5650/5570 (Redwood based):
    • DirectX 11 (Compute Shader 5.0, 4.1, 4.0) and lower,
    • OpenGL 4.1
    • OpenGL ES 2.0
    • OpenCL 1.1
    • AMD Accelerated Parallel Processing, short APP (formerly known as Ati Stream)
    • No Double Precision Support
    • FMA at 4/5th speed
    • Triangle-setup at full speed
    Radeon HD 5450 (Cedar based):
    • DirectX 11 (Compute Shader 5.0, 4.1, 4.0) and lower,
    • OpenGL 4.1
    • OpenGL ES 2.0
    • OpenCL 1.1
    • AMD Accelerated Parallel Processing, short APP (formerly known as Ati Stream)
    • No Double Precision Support
    • FMA at 4/5th speed
    • Triangle-setup at half rate
    Radeon HD 4890/4870/4850/4830/4730 (RV770/790 based):
    • DirectX 10.1 (Compute Shader 4.1, 4.0) and lower,
    • OpenGL 3.2
    • OpenGL ES 2.0
    • OpenCL 1.0
    • AMD Accelerated Parallel Processing, short APP (formerly known as Ati Stream)
    • Double Precision at 1/5th speed
    • Triangle-setup at full speed
    Radeon HD 4770 (RV740 based):
    • DirectX 10.1 (Compute Shader 4.1, 4.0) and lower,
    • OpenGL 3.2• OpenGL ES 2.0
    • OpenCL 1.0
    • AMD Accelerated Parallel Processing, short APP (formerly known as Ati Stream)
    • Double Precision at 1/5th speed
    • Triangle-setup at full speed
    Radeon HD 4670/4650 (RV730 based):
    • DirectX 10.1 (no Compute Shader though) and lower,
    • OpenGL 3.2
    • OpenGL ES 2.0
    • OpenCL 1.0
    • AMD Accelerated Parallel Processing, short APP (formerly known as Ati Stream)
    • No Double Precision Support
    • Triangle-setup at full speed
    Radeon HD 4550/4350 (RV710 based):
    • DirectX 10.1 (no Compute Shader though) and lower,
    • OpenGL 3.2
    • OpenGL ES 2.0
    • OpenCL 1.0
    • AMD Accelerated Parallel Processing, short APP (formerly known as Ati Stream)
    • No Double Precision Support
    Radeon HD 3870/3850 (RV670 based):
    • DirectX 10.1 (no Compute Shader though) and lower,
    • OpenGL 3.2
    • OpenGL ES 2.0
    • no OpenCL
    • AMD Accelerated Parallel Processing, short APP (formerly known as Ati Stream)
    • Double Precision (at 1/5th speed)
    Radeon HD 3650 (RV635 based):
    • DirectX 10.1 (no Compute Shader though) and lower,
    • OpenGL 3.2
    • OpenGL ES 2.0
    • no OpenCL
    • AMD Accelerated Parallel Processing, short APP (formerly known as Ati Stream)
    • No Double Precision Support
    Radeon HD 3470/3450 (RV615 based):
    • DirectX 10.1 (no Compute Shader though) and lower,
    • OpenGL 3.2
    • OpenGL ES 2.0
    • no OpenC
    L• AMD Accelerated Parallel Processing, short APP (formerly known as Ati Stream)
    • No Double Precision Support
    Radeon HD 2900 XT/Pro/GT/OEM (R600 based):
    • DirectX 10 (no Compute Shader though) and lower,
    • OpenGL 3.2
    • OpenGL ES 2.0
    • no OpenCL
    • AMD Accelerated Parallel Processing, short APP (formerly known as Ati Stream)
    • No Double Precision Support
    Radeon HD 2600 XT/Pro (RV630 based):
    • DirectX 10 (no Compute Shader though) and lower,
    • OpenGL 3.2
    • OpenGL ES 2.0
    • no OpenCL
    • AMD Accelerated Parallel Processing, short APP (formerly known as Ati Stream)
    • No Double Precision Support
    Radeon HD 2400 XT/Pro (RV610 based):
    • DirectX 10 (no Compute Shader though) and lower,
    • OpenGL 3.2
    • OpenGL ES 2.0
    • no OpenCL
    • AMD Accelerated Parallel Processing, short APP (formerly known as Ati Stream)
    • No Double Precision Support
    Comments 2 Comments
    1. Unregistered's Avatar
      Unregistered -
      Just an fyi the catalyst 11.8 preview drivers come with a newer OpenCL driver:
      CL Driver version: 1.4.1523
      CL device version: ... 709.2
    1. Carsten's Avatar
      Carsten -
      Thanks - I'll be having a look at it tomorrow!