Gaming Rig for around 130K

Cilus · Nov 9, 2012

Graphics card:

I guess choice of this component is actually the most tricky and we are having more discussions about it which is higher than any of the other components. Better FPS or smoothness, Compute Performance or raw gaming performance, single GPU or multi GPU setup, all the things have been discussed and in my article, I will try to relate all the things.

FPS or Frames Per Second: It is the most common technique of measuring performance and most review sites actually stress on this parameter only. Now better FPS is always better in most cases but now we know there are certain catches.
Gaming Latency is very important while playing games as it decides how responsive the GPU is between the input from the player and the effects to take place. Sometimes it is noticed that even full frame rate is maintained, the game lags if very high amount of post processing is being applied. It looks like Nvidia SLI setup does have an advantage here as per the HardOcp review of GT 660 Ti which Vicky has posted. However, in 1080P, this latency is almost negligible in current gen Graphics cards of same league.

Discarded Frames: Today's most of the Monitors, including Anand's BenQ G2420HD is capped at 60 Hz and it is impossible to see more than 60 FPS in real time. Then why we are opting for cards which can deliver over 100 FPS sometimes. There is a effect called Plummeting in Gaming, which reduces the smoothness, despite having 60FPS on average. Now here the extra FPS comes to rescue. By using the discarded Frames (Display can show only 60 frames in 1 sec whereas 90 Frames are available from the Gfx card) and a simple algorithm, GPU can enhance the output frame quality to provide a smooth game-play experience as well as very good Motion Blur effects. AMD has an upper edge here as it is observed in several reviews that Game Play experience become smoother with the increase of FPS in games. I'm sure they will add new algorithm in their upcoming drivers to take the advantage of it for reducing Gaming latency.

Here my votes goes to AMD for that reason because at same price point, AMD cards deliver little extra frames while gaming.

Compute Performance
The GCN architecture is far more capable than FERMI as well as KEPLER architecture when it comes to compute performance. The Microsoft based DirectCompute and open source OpenCL API based apps run far better in GCN design, thanks to its VECTOR PROCESSOR based design where each of the Compute Engine of the GPU can handle a thread independently. They have their local cache and read/write interface to enhance the performance. On the other hand each of the SP in Kepler is not as powerful as the FERMI architecture. Although it has more number of shader processor, a single shader can't handle a single thread alone...for that it needs a cluster of shaders.

Gaming Performance.
Gaming performance depends of many factors, memory bandwidth, number of shader processors, Number of ROPs and their performance, Texture handling capabilities. The conventional gaming approach basically stresses upon Vertex Shader and Pixel shader performance as GPU can perform a single operation over millions of pixels in parallel. in conventional way, all the gaming effects like Ambient Occlusion, Depth of Field, HDR Bloom and Motion Blur can be implemented using Pixel and Texture operations. But the processes are very heavy and older generation GPUs excels over those fields by using their raw parallel processing powers. But now as games are getting more and more demanding in GPU performance for rendering, if all those effects are turned on and executed in conventional methods, that might create a heavy toll over the GPU, resulting poor performance. So smart and efficient techniques are required to implement the advanced effects in a Game.

Are Gaming Performance and Compute Performance two different fields?

In most reviews, we see that Compute performance and Gaming performance are compared as two different faces of a Coin which lead to the believe that those parameters are different and a GPU can excel in one field while doing not so good in other field. But with the emergence of new API like DirectCompute, OpenCL, CUDA which can directly access GPU resources like the Stream Processor Clusters, Video Ram, shared memory like a conventional CPU, enables developers to create some advanced models like a Vector Processor type Processing. As a result we can have threads, data structures like Array, Structure and Classes, Object oriented models in GPU p[programming, just like what we do with our CPU wiyh one exception...GPU can process multiple threads at once by parallel processing unlike the sequential execution methodology of CPU.

I will discuss some of the advanced Graphics enhancement techniques , how they have been implemented in conventional way and how GP-GPU computing performance can improve the performance.

Ambient Occlusion or AO:
Definition: Ambient occlusion is a method to approximate how bright light should be shining on any specific part of a surface, based on the light and it's environment. This is used to add realism.
Here how a surface will be illuminated is not calculated based on a single point source of light, instead it is calculated by studying how the environment and surroundings of that surface interacts with light. So a place surrounding by other objects will be darker even though the light source is same for all of them.

This technique is a Global approach and needs to applied over the whole image rather than applying it on any specific objects.

Z-Buffer or Depth Buffer: In computer graphics, z-buffering, also known as depth buffering, is the management of image depth coordinates in three-dimensional (3-D) graphics, usually done in hardware, sometimes in software. It is one solution to the visibility problem, which is the problem of deciding which elements of a rendered scene are visible, and which are hidden.

Conventional Pixel Shader Approach:
The algorithm is implemented as a pixel shader, analyzing the scene depth buffer which is stored in a texture. For every pixel on the screen, the pixel shader samples the depth values around the current pixel and tries to compute the amount of occlusion from each of the sampled points. In its simplest implementation, the occlusion factor depends only on the depth difference between sampled point and current point.

For each Pixel present in the image
Check surrounding neighborhood to see if they form a concave region–(Fit a cone, is it concave or convex). This step is need to check if surroundings of the pixels
Improved results if normal of the point included in check.

*i.imgur.com/uFk9g.jpg?1

Given a point P on the surface with normal N, here roughly two-thirds of the hemisphere above P is occluded by other geometry in the scene, while one-third is unoccluded. The average direction of incoming light is denoted by B, and it is somewhat to the right of the normal direction N. Loosely speaking, the average color of incident light at P could be found by averaging the incident light from the cone of unoccluded directions around the B vector.

Now there is no smart method available to detect how many surrounding pixels are needed to be taken care of to provide a realistic illumination of point P. Otherwise using Brute force algorithm, the GPU needs to perform 200 texture reads per second which is not possible in real time using current generation hardware. So the following approximation is applied:
1. Some Random sample of pixels are taken from the surroundings of the point needed to be illuminated.
2. For each of the pixels present in the sample, their depth buffer is read from the texture Buffer of the Graphics card.
3. Now using the algorithm mentioned above, the GPU creates an approximate illumination level of the desired point.

Disadvantages:
1.Huge I/O cost. Here for each pixel, the GPU needs to access the Z-Buffer or Texture unit to get its Depth value. in case of 1920X1080 Resolution with 60 Fps, where we are taking 100 neighboring pixel
samples for each of the pixels, the total number of Z-Buffer read will be 1920X1080 (Total number of pixels in a single Frame) X 100 (Sample Size for each pixel) X 60 (Number of Frames per frame) =
12,441,6 X 10^5 which is huge for even the parallel processing power of GPU. Obviously that number is an rough estimation and can be minimized using different techniques, it gives us an idea how it can affect the game
play experience.

2. As the Sample of neighboring pixels are taken randomly, the output might not be that realistic compared to the compute power it needs and sometime creates an unnecessary shadow effect.

3. Here, for parallel processing, we need to rely upon the Texture buffer or Z-Buffer which can hold normally 12 texels per Sp cluster but not taking the advantage of the local registers and Cache Memory of each of the
Stream Processors.

4.Poor Resource sharing. I guess you guys have already understood that two neighboring pixel P1 and P2, situated very closely will have almost same surrounding pixels in common. So if we can keep the sample values in
GPU registers, taken for P1 and 1st check if they are also neighbor of P2 we can actually save the whole sampling thing required for P2. But unlike data in CPU, pixels cannot be directly kept in registers as no information about
them is present as the Sampling picks up random set of pixels.

GPU Computing based Approach:
Now 1st we will discuss another algorithm for AO, which uses Ray Tracing.

*i.imgur.com/Jytok.png

Local ambient occlusion in image-space:

(a) Rays emerging from a point Pmany of which are not occluded (marked red).

(b)Rays being constrained within a distance of rfar, with distant occluders being neglected (blue arrows).

(c) Pixels as viewed by a camera.Neighboring pixels are obtained (marked in green).

(d) We de-project these pixels back to world-space and approximate them using spherical occluders (green circles). These are the final approximated occluders around the point that are sought in image-space. Note that the spheres are pushed back a little along the opposite direction of the normal (−ˆ n) so as to prevent incorrect occlusion on flat surfaces.

*i.imgur.com/iEIzb.gif

In this example we have two sampling points A and B. At position A only a few rays hit the sphere therefore the influence of the sphere is small, at position B a lot of rays hit the sphere and the influence is big what results in a darker color here.
So the algorithm for Compute Shader will be something like that:

1. For every pixel in the image, perform Ray tracing and identify the neighboring pixel Samples required.
2. Check if the depth value of the selected pixels are already in GroupShared Registers.
3. If Yes, then pick the values from them .
4. If No, then read the Z-Buffer and place the fetched values in GroupShared Registers

Advantages:
• Using the groupsharedmemory avoids an incredible amount of over-sampling
• It can be filled using the Gather instruction, which further reduces the number of TEX operations
• Each of the SP of the GPU can perform operation for a different pixels in parallel and can have the data kept in Shared Memory or in Group Shared Registers to be accessed by other SP.
• Each of the SP does have their own Cache memory which can be used to keep frequently read data. It helps a single SP to minimize its I/O operations when it moves to the next pixel after finishing the AO calculation for the current one.

I hope this explains how GPU compute performance is actually beneficial for Gaming performance to implement Ambient Occlusion. Crysis is the 1st Game to implement AO but by means of Pixel Shaders and we all know how heavy the game was on hardware. On the other hand most of the latest games like Battleforge, Battlefield 3, Starcraft III etc use compute based AO logic and they run far smoother.

SO I guess you guys understood my point, a GPU with better compute power will definitely going to have an advantage in current and future games because of those factors discussed above.That's why AMD cards with better compute performance perform better in games like dirt 3, battleforge , civilization 5 etc.

In next iteration, I'll discuss about Depth of Field.

Sayonara till the DOF.....HAHAHAHAHA

dan4u · Nov 9, 2012

^

dude you're awesome.........

Naxal · Nov 9, 2012

One total off topic personal statement for OP (gameranand)

gameranand said:
Even though I have good Budget then also Please suggest VFM products as its hard earned money which I have saved for years so can't waste it. I hope you understand that.

Moore's law is very true, every 16 to 18 months computing power available in the said price bucket will double.

Since its hard earn money, why not a 50k type system with HD7850 or HD7870 ?

That should safely run you for 2 / 2.5 years and then invest again for a similar 50k-ish system on next gen platform ?

gameranand · Nov 9, 2012

Damn good explanation Cilus. Thanks for clearing the doubts.

Naxal said:
One total off topic personal statement for OP (gameranand)

Moore's law is very true, every 16 to 18 months computing power available in the said price bucket will double.

Since its hard earn money, why not a 50k type system with HD7850 or HD7870 ?

That should safely run you for 2 / 2.5 years and then invest again for a similar 50k-ish system on next gen platform ?

A very good CPU can easily lst for around 4-5 years for sure. Take example of Intel Core 2 Quad processeors released back in 2006 or 2007 I guess and they are still good enough for most task.
Also I am actually thinking about multi GPU setup of Crossfire, with HD7870, so I guess that is also quite good and would last for around 3-4 years for playing games at high details, as for other components like Cabinet, mice, KB and others, they are kind of one time investment. A mechanical keyboard I am going is buy would probably last more than regular dome based KBs.

dan4u · Nov 9, 2012

correct me if I'm wrong, but you can use a HD 7970 for crossfire in 3-4 years right?? or will a HD 7970 not be available in 3-4 years??

Cilus · Nov 9, 2012

Arey Man, it is not finished and don't comment now. Let me finish it 1st. Gimme 10 mins more. Need to cover the DirectCompute in Gaming

gameranand · Nov 9, 2012

dan4u said:
correct me if I'm wrong, but you can use a HD 7970 for crossfire in 3-4 years right?? or will a HD 7970 not be available in 3-4 years??

Do you really think that 7970 would be available after 3 years and even it is, would it be better to buy that or a much better card at same cost ?? Vicky and Cilus are right, if I have to do crossfire then I have to do it from start.

Cilus · Nov 10, 2012

dan4u said:
correct me if I'm wrong, but you can use a HD 7970 for crossfire in 3-4 years right?? or will a HD 7970 not be available in 3-4 years??

hd 58xx series was available 3 years back. Could you find one such card in the market now??

I'm also suggesting a 7870 xfire to gameranand. Its current surpasses any single gpu card like 7970/680 and that too by a huge margin. And not only that, in higher resolutions, it allows higher AA settings to be enabled
not possible on any single gpu. With that, you get a superior image quality at the same price range of the custom versions of highest end cards from both the camp.

Obviously amd suffers from micro stuttering with their default drivers than nvidia SLI setting, but with the use of third party tool like RADEON PRO that problem can be easily handled. Check the latest review of 7990/7970x2
in tomshardware and you'll find out.

About your suggestion of going for a single 7970 now and xfiring later kinda logic is never a realistic scenario and you know that very well. After 1 year, 7970 will make no buying sense due to the presence of better
performing cards at the same and lower price points. Check what happened with 6970. A 13k card like a 7850 can beat it fair and square. I guess i'm pretty clear now.

Skud · Nov 10, 2012

For long-term plan of dual GPU, stick with popular mid-range GPUs. A couple of 7850 is at least as fast as the 7970, with just 2 power connectors needed instead of 4 and costs less. And if your game doesn't support CF, single card performance is no slouch either. But that particular card you have chosen is in a different league altogether. Well worth the money, I would say.

gameranand · Nov 10, 2012

Cilus said:
hd 58xx series was available 3 years back. Could you find one such card in the market now??

I'm also suggesting a 7870 xfire to gameranand. Its current surpasses any single gpu card like 7970/680 and that too by a huge margin. And not only that, in higher resolutions, it allows higher AA settings to be enabled
not possible on any single gpu. With that, you get a superior image quality at the same price range of the custom versions of highest end cards from both the camp.

Obviously amd suffers from micro stuttering with their default drivers than nvidia SLI setting, but with the use of third party tool like RADEON PRO that problem can be easily handled. Check the latest review of 7990/7970x2
in tomshardware and you'll find out.

About your suggestion of going for a single 7970 now and xfiring later kinda logic is never a realistic scenario and you know that very well. After 1 year, 7970 will make no buying sense due to the presence of better
performing cards at the same and lower price points. Check what happened with 6970. A 13k card like a 7850 can beat it fair and square. I guess i'm pretty clear now.

Could you please give me the link of the review you mentioned and also some info about the CPU cooler you mentioned would be nice.

Skud said:
For long-term plan of dual GPU, stick with popular mid-range GPUs. A couple of 7850 is at least as fast as the 7970, with just 2 power connectors needed instead of 4 and costs less. And if your game doesn't support CF, single card performance is no slouch either. But that particular card you have chosen is in a different league altogether. Well worth the money, I would say.

Actually I am thinking about Sapphire 7870 Crossfire as suggested by Cilus as the total cost would be lesser than Matrix 7970 Platinum and performance increase would be significant. Also we should not forget that 7870 cards got 10% boost with the latest driver update, highest in 7xxx series. I am not saying that Matrix 7970 is bad option, its just that if I am getting better performance with lesser price then why not. I hope you got my point.

Skud · Nov 10, 2012

2x7870 is better performing than one 7970. If you are getting the former at lesser price, it's an easy choice IMO.

gameranand · Nov 10, 2012

Sapphire HD7870 is available for around 17.6K which makes it around 35.2K means 1.3K lesser than Matrix 7970 Platinum.

Cilus · Nov 10, 2012

Here goes the link

SAPPHIRE GRAPHICS CARD HD 7870 GHz EDITION 2GB GDDR5

If Anand is going to buy two of 'em then definitely he is going to have some discount. I'll try my best for that.

BTW, how about my rest of the article?

Skud · Nov 10, 2012

What about noise coming from those 4 fans?

BTW, MD also have the Sapphire 7970 OC for 29k, and the Sapphire 7950 Vapor-X for 22k. Feeling greedy.

Cilus · Nov 10, 2012

^^ He'll be gaming with a 7.1 headset namely Razer Megalodon or Vengeance 1500.

With the roar of Megalodon or rage of 1500 power Vengeance, i don't think any unwanted noise gonna displease those ears coz they won't be reaching them.

While not gaming, I think you've heard of AMD Zero Core Technology.

gameranand · Nov 10, 2012

Cilus said:
Here goes the link

SAPPHIRE GRAPHICS CARD HD 7870 GHz EDITION 2GB GDDR5

If Anand is going to buy two of 'em then definitely he is going to have some discount. I'll try my best for that.

BTW, how about my rest of the article?

Oh...I was actually asking you about review link not product link.

As for rest of Article, I would surely love that specially the part where you'll describe about single and multi GPU setup and their advantages and disadvantages.

Skud said:
What about noise coming from those 4 fans?

BTW, MD also have the Sapphire 7970 OC for 29k, and the Sapphire 7950 Vapor-X for 22k. Feeling greedy.

I will be gaming with Headphones as Cilus stated, also I am going to buy Storm Trooper which isolates the noise pretty well, looks like this feature will go for test in my setup.

rohit32407 · Nov 10, 2012

Thanks Cilus for that very well written and detailed explanation

.

sumonpathak · Nov 10, 2012

mother of god...errr...explanation.......am so gonna print this out....

gameranand · Nov 11, 2012

@ Sumon
CYPM.

topgear · Nov 12, 2012

great write up by Cilus ;-)

- big thanks to him .

Gaming Rig for around 130K

laborare est orare

Attachments

Took Off!!!

Little Kid

Living to Play

Took Off!!!

laborare est orare

Living to Play

laborare est orare

Super Moderator

Living to Play

Super Moderator

Living to Play

laborare est orare

Super Moderator

laborare est orare

Living to Play

Ambassador of Buzz

knocking on heavens door

Living to Play

Super Moderator