ATI stakes claims on physics, GPGPU ground By Scott Wasson - 11:09 AM, October 11, 2005
One of the more surprising aspects of ATI's Radeon X1000 series launch is something we didn't get a chance to talk about in our initial review of the graphics cards: ATI's eagerness to talk about using its GPUs for non-graphics applications.
ATI practically kicked off its press event for the Radeon X1000 series with a physics demo running on a Radeon graphics card. Rich Heye, VP and GM of ATI's Desktop Business Unit, showed off a simulation of rolling ocean waves comparing physics performance on a CPU versus a GPU. The CPU-based version of the demo was slow and choppy, while the Radeon churned through the simulation well enough to make the waves flow, er, fluidly. The GPU, he proclaimed, is very good for physics work, and he threw out some impressive FLOPS numbers to accentuate the point. A Pentium 4 at 3GHz, he said, peaks out at 12 GFLOPS and has 5.96GB/s of memory bandwidth. By contrast, a Radeon X1800 XT can reach 83 GFLOPS and has 42GB/s of memory bandwidth.
This demonstration was more important as a statement of position and direction for ATI than anything else. NVIDIA has been rather enigmatic about any plans for physics processing, and has seemed to step out of AGEIA's way for the most part, welcoming the PhysX effort as another means of improving PC gaming. ATI is clearly ready and willing to take on AGEIA's PhysX card by using its GPUs for physics simulation, and the company believes that the more general programmability of its Shader Model 3.0 implementation in the Radeon X1000 series could make it a viable competitor there. There was talk of a CrossFire dual-GPU config in a split setup, with one GPU handling graphics and the other handling physics for certain games, and somebody present even suggested the possibility of a third GPU on a PCI card dedicated to physics processing. The folks from ATI seemed to like this suggestion.
We haven't heard many specifics yet about how ATI plans to go about exposing the physics acceleration capabilities of its GPUs to developers. One obvious path forward would seem to be cooperation with physics middleware vendors like Havok, but AGEIA has already made extensive inroads into next-gen games development by giving away licenses for its PhysX API. If ATI is serious about making this push, it may have to slog through an API war by pushing developers to use something other than PhysX API. We shall see.
Beyond just physics, ATI is happy to see its GPUs being used for all sorts of general-purpose computation. To that end, it invited Mike Houston from Stanford to come talk about his efforts at GPGPU, or general-purpose computation on graphics processing units. Houston gave a nice overview of how graphics lingo translates into general purpose computation, where textures are used as general storage and pixel shaders operate on data other than pixels. There's a PDF of his talk available online, so you can read it for yourself. Houston particularly liked the Radeon X1800's new features, including 32-bit floating-point precision, long shader programs with flow control, and faster data uploads and downloads over PCI Express. He also praised the X1800's large number of available registers and its support for 512MB memory, which he said is sorely needed in GPGPU applications.
Houston gave several examples of applications where GPUs can outshine CPUs for certain types of data-parallel processing. One of the most exciting was an implementation of the GROMACS library that's used for protein folding, as in Folding@Home. Although the GROMACS implementation wasn't yet optimized for the Radeon X1000 series and still used Pixel Shader 2.0b, the Radeon X1800 XT was already performing between 2.5 and 3.5 times faster than a Pentium 4 3GHz processor. The GeForce 7800 GTX didn't fare so well, achieving only about half the speed of the P4 3GHz. Houston offered a laundry list of things that GPGPU developers need from graphics chip makers, including better information about how to program the GPU and more direct access to the hardware.
To that end, ATI pledged to open the X1000 platform to developers by documenting the GPU architecture for use in data-parallel processing and providing a thin layer of abstraction so GPGPU developers can get more direct access to the hardware. ATI also said it planned to "seed developers with tools and prototype applications" for X1000 development.
28 Comments(s). 1 Pages(s). Showing page 1. [ 1 ]
#28. Posted at 03:29 PM on Oct 13th 2005 Edit Reply
Amun
Here you go guys, a 1800xt for $109 bucks!!!
*www.pricegrabber.com/search_getprod.php/masterid=11173272/
#27. Posted at 08:53 AM on Oct 12th 2005 Edit Reply
dlang
with all this talk about an open API to use the card for physics processing I hope they also open the API for regular graphics use as well (i.e. for makeing Linux drivers)
#12. Posted at 02:46 PM on Oct 11th 2005 Edit Reply
ArturNOW
But you need 2 graphic card to do physics. One render scenes and the other one handles physics, right? So all in all its cheaper to buy PhysX and this card needs less power...
#12, Well, those 512 threads in the R520 aren't just for show. Be... : Flying Fox (#13) «
#13, Seems like it would use up resources (especially memory) tha... : Usacomp2k3 (#16) «
#16, in current cards it would be a "waste" of resources. but ima... : mrzeld (#17) «
#17, That's what i'm talking about : ArturNOW (#26) «
#25. Posted at 12:41 AM on Oct 12th 2005, Edited at 12:42 AM on Oct 12th 2005 Edit Reply
Tupuli
The whole GPGPU thing is a waste. Even fairly trivial things like conjugate gradient (a common step in solving PDEs like navier-stokes, or elasticity) aren't terribly fast on the GPU despite it's amazing peak FLOP performance.
By the time GPUs have the generality to do these sorts of numerical algorithms we'll have 8 core CPUs.
Notice that they are comparing top of the line cards with careful optimization (i.e. painstaking implementation) versus a 3Ghz Pentium. Why not compare against a dual core chip that can accelerate a broad class of problems? Even the Xbox will have 3 general purpose processors.
The only reason that the GPU has such amazing performance is its dedication to a specific task. It will never have a reason to have good branch or cache performance, things that are very important for good performance in numerics.
Unless the GPU can outpace the CPU by a factor of 10 or more, no one will bother implementing these algorithms on the GPU. Far more likely is a low-power, general, multi-core processor.
I see lots of hype (the GPU is a supercomputer!), but little substance.
*techreport.com/onearticle.x/8887
One of the more surprising aspects of ATI's Radeon X1000 series launch is something we didn't get a chance to talk about in our initial review of the graphics cards: ATI's eagerness to talk about using its GPUs for non-graphics applications.
ATI practically kicked off its press event for the Radeon X1000 series with a physics demo running on a Radeon graphics card. Rich Heye, VP and GM of ATI's Desktop Business Unit, showed off a simulation of rolling ocean waves comparing physics performance on a CPU versus a GPU. The CPU-based version of the demo was slow and choppy, while the Radeon churned through the simulation well enough to make the waves flow, er, fluidly. The GPU, he proclaimed, is very good for physics work, and he threw out some impressive FLOPS numbers to accentuate the point. A Pentium 4 at 3GHz, he said, peaks out at 12 GFLOPS and has 5.96GB/s of memory bandwidth. By contrast, a Radeon X1800 XT can reach 83 GFLOPS and has 42GB/s of memory bandwidth.
This demonstration was more important as a statement of position and direction for ATI than anything else. NVIDIA has been rather enigmatic about any plans for physics processing, and has seemed to step out of AGEIA's way for the most part, welcoming the PhysX effort as another means of improving PC gaming. ATI is clearly ready and willing to take on AGEIA's PhysX card by using its GPUs for physics simulation, and the company believes that the more general programmability of its Shader Model 3.0 implementation in the Radeon X1000 series could make it a viable competitor there. There was talk of a CrossFire dual-GPU config in a split setup, with one GPU handling graphics and the other handling physics for certain games, and somebody present even suggested the possibility of a third GPU on a PCI card dedicated to physics processing. The folks from ATI seemed to like this suggestion.
We haven't heard many specifics yet about how ATI plans to go about exposing the physics acceleration capabilities of its GPUs to developers. One obvious path forward would seem to be cooperation with physics middleware vendors like Havok, but AGEIA has already made extensive inroads into next-gen games development by giving away licenses for its PhysX API. If ATI is serious about making this push, it may have to slog through an API war by pushing developers to use something other than PhysX API. We shall see.
Beyond just physics, ATI is happy to see its GPUs being used for all sorts of general-purpose computation. To that end, it invited Mike Houston from Stanford to come talk about his efforts at GPGPU, or general-purpose computation on graphics processing units. Houston gave a nice overview of how graphics lingo translates into general purpose computation, where textures are used as general storage and pixel shaders operate on data other than pixels. There's a PDF of his talk available online, so you can read it for yourself. Houston particularly liked the Radeon X1800's new features, including 32-bit floating-point precision, long shader programs with flow control, and faster data uploads and downloads over PCI Express. He also praised the X1800's large number of available registers and its support for 512MB memory, which he said is sorely needed in GPGPU applications.
Houston gave several examples of applications where GPUs can outshine CPUs for certain types of data-parallel processing. One of the most exciting was an implementation of the GROMACS library that's used for protein folding, as in Folding@Home. Although the GROMACS implementation wasn't yet optimized for the Radeon X1000 series and still used Pixel Shader 2.0b, the Radeon X1800 XT was already performing between 2.5 and 3.5 times faster than a Pentium 4 3GHz processor. The GeForce 7800 GTX didn't fare so well, achieving only about half the speed of the P4 3GHz. Houston offered a laundry list of things that GPGPU developers need from graphics chip makers, including better information about how to program the GPU and more direct access to the hardware.
To that end, ATI pledged to open the X1000 platform to developers by documenting the GPU architecture for use in data-parallel processing and providing a thin layer of abstraction so GPGPU developers can get more direct access to the hardware. ATI also said it planned to "seed developers with tools and prototype applications" for X1000 development.
28 Comments(s). 1 Pages(s). Showing page 1. [ 1 ]
#28. Posted at 03:29 PM on Oct 13th 2005 Edit Reply
Amun
Here you go guys, a 1800xt for $109 bucks!!!
*www.pricegrabber.com/search_getprod.php/masterid=11173272/
#27. Posted at 08:53 AM on Oct 12th 2005 Edit Reply
dlang
with all this talk about an open API to use the card for physics processing I hope they also open the API for regular graphics use as well (i.e. for makeing Linux drivers)
#12. Posted at 02:46 PM on Oct 11th 2005 Edit Reply
ArturNOW
But you need 2 graphic card to do physics. One render scenes and the other one handles physics, right? So all in all its cheaper to buy PhysX and this card needs less power...
#12, Well, those 512 threads in the R520 aren't just for show. Be... : Flying Fox (#13) «
#13, Seems like it would use up resources (especially memory) tha... : Usacomp2k3 (#16) «
#16, in current cards it would be a "waste" of resources. but ima... : mrzeld (#17) «
#17, That's what i'm talking about : ArturNOW (#26) «
#25. Posted at 12:41 AM on Oct 12th 2005, Edited at 12:42 AM on Oct 12th 2005 Edit Reply
Tupuli
The whole GPGPU thing is a waste. Even fairly trivial things like conjugate gradient (a common step in solving PDEs like navier-stokes, or elasticity) aren't terribly fast on the GPU despite it's amazing peak FLOP performance.
By the time GPUs have the generality to do these sorts of numerical algorithms we'll have 8 core CPUs.
Notice that they are comparing top of the line cards with careful optimization (i.e. painstaking implementation) versus a 3Ghz Pentium. Why not compare against a dual core chip that can accelerate a broad class of problems? Even the Xbox will have 3 general purpose processors.
The only reason that the GPU has such amazing performance is its dedication to a specific task. It will never have a reason to have good branch or cache performance, things that are very important for good performance in numerics.
Unless the GPU can outpace the CPU by a factor of 10 or more, no one will bother implementing these algorithms on the GPU. Far more likely is a low-power, general, multi-core processor.
I see lots of hype (the GPU is a supercomputer!), but little substance.
*techreport.com/onearticle.x/8887