AMD Bulldozer News and Discussion

max_snyper

Maximum Effort!!!!!!
Beating Nvidia is more easy than beating Intel in their own game.
Nvidia is not wrong company..just the company which does things a lil bit wrong way!
BTW 28nm shrink is a half node right for AMD.
 

mrcool63

Journeyman
ok guys single threaded apps are bad on BD because of two things
1. The cores are thinner... Sandy's are much thicker..
2. The 8core is actually behaving more like a 4 core + one additional FPU per core.. now when a program assumes that to be an 8 core it distributes the calculations among them making it saturate the front end... so in the next BD they plan to increse L2 and L3..

If you want to test the BD in a single threaded app.. disable the 4 additional fpu cores and then test.. you will see the IIPC is about 10-20 percent more.. because the front end has better handling capabilities..

This is another argument in favour of BD

SB E will be horribly overpriced.. the models in the price bracket of the 2600k is a locked proccy and not unlocked as in 2600k. also its socket will be 2011 and not 1155.. LGA 2011 boards as of release will be minimum 15k and above.. Ivy bridge will be 1155 but it is expected to release around may next year.. so a pretty long wait for you if you want a ivy..

So unless you are prepared to shell out minimum 30k+ for just mobo and proccy then waiting is a useless prospect..

Now do you understand why BD is competitive... 2600k will be the best till next year ivy for mid range..
 

vickybat

I am the night...I am...
ok guys single threaded apps are bad on BD because of two things
1. The cores are thinner... Sandy's are much thicker..
2. The 8core is actually behaving more like a 4 core + one additional FPU per core.. now when a program assumes that to be an 8 core it distributes the calculations among them making it saturate the front end... so in the next BD they plan to increse L2 and L3..

If you want to test the BD in a single threaded app.. disable the 4 additional fpu cores and then test.. you will see the IIPC is about 10-20 percent more.. because the front end has better handling capabilities..

This is another argument in favour of BD

SB E will be horribly overpriced.. the models in the price bracket of the 2600k is a locked proccy and not unlocked as in 2600k. also its socket will be 2011 and not 1155.. LGA 2011 boards as of release will be minimum 15k and above.. Ivy bridge will be 1155 but it is expected to release around may next year.. so a pretty long wait for you if you want a ivy..

So unless you are prepared to shell out minimum 30k+ for just mobo and proccy then waiting is a useless prospect..

Now do you understand why BD is competitive... 2600k will be the best till next year ivy for mid range..

Would you elaborate what you mean by a thick core and a thin core? I didn't get the head or tail of it. And what has the fpu unit got to do with single threaded performance?

Bulldozer's fpu is a completely different execution unit that only handles float operations or instructions from a thread. Even sandybridge's core has one. If the scheduler finds float instructions,then its assigned to the fpu.
As simple as that.

Now disabling all fpu units will result in even slower performance cause it will just ignore all floating point operations. The statement you made seems ridiculous unless you throw some more light into it.

Now coming to sandybridge-E, they are different from mainstream sandybridge when it comes to overclocking. Although the non extreme parts have locked multiplier but doesn't feature a locked bclk. That means base clock can be incremented to overclock the cpu. In 1155 sandybridge, the bclk is locked. So you know, even the sub $300 sandybridge-E parts will make sense for enthusiasts on a budget.

And i still don't understand how you call BD competitive. Care to explain that ?
 

max_snyper

Maximum Effort!!!!!!
@mrcool63
why dont you understand SB-E is for enthusiast who have deeeeep pockets to buy that stuff.....
This is what you wnat===Whose comparing who
1.AMD BD <> Intel previous gen i-series as well as new gen i-series(well some of them).
2.AMD Piledriver <> Intel Ivybridge.

SB-E is more of a server grade system with quad channel,new qpi and what not...
Its not for common people....as the i-series 2xxx and amd ph-ii & BD.
get it now!!!!
 

mrcool63

Journeyman
dude i referred to sb-e more for a timeline... and to tell some people who said wait for sb-e and ivy to upgrade..

as u said sb-e is enthusiasts proccy.. so will be premium priced.. the next proccy in the 2600k range will be ivy... follow me...

now ivy will be released by may here.. so till next year 2600k is king.. BD's performance is comparable to sandy.. so the next BD will be almost equal to 2600k releasing by jan or feb. so till ivy releases amd will have a proccy to compete with ivy..

where this is all leading to is that BD is quite a good bet

@vicky-- dude the architecture for BD is tricky to understand.. here is something that will make my argument a little clear

from semiaccurate..

That brings us to the oddest part of the architecture, the FP unit. It is shared between the two Int units, and is seen as a coprocessor, not an integrated pipeline like almost every other modern CPU. This means that any FP instruction will be fired off to the shared FP scheduler, there is only one, and when the instruction is completed, the FP unit signals the ‘core’ that it is done.
Remember those added resources that were mentioned earlier? Currently, the ‘Stars’ cores have a 128-bit FP unit. With Bulldozer, there is one FP unit that can crunch two 128-bit numbers per clock. The shared scheduler means there is a single central arbiter that can make sure things are ‘fair’ to both cores, but if one core doesn’t use an FP instruction that clock, the other core can use twice the resources it is usually allowed to.

So even though the FP scheduler is shared there are two 128bit FMAC cores as opposed to one for every core in the conventional core topology.. When you disable one core in a module the other core will have two 128 bit FMAC's instead of one as is the convention.. Hence my statement ' One core plus an additional fpu'

Now do you understand??
 
Last edited:

Skud

Super Moderator
Staff member
Dude, let BD release in India first. Depending on its price, we may or may not recommend it.
 
OP
Cilus

Cilus

laborare est orare
ok guys single threaded apps are bad on BD because of two things
1. The cores are thinner... Sandy's are much thicker..
2. The 8core is actually behaving more like a 4 core + one additional FPU per core.. now when a program assumes that to be an 8 core it distributes the calculations among them making it saturate the front end... so in the next BD they plan to increse L2 and L3..

If you want to test the BD in a single threaded app.. disable the 4 additional fpu cores and then test.. you will see the IIPC is about 10-20 percent more.. because the front end has better handling capabilities..

This is another argument in favour of BD

SB E will be horribly overpriced.. the models in the price bracket of the 2600k is a locked proccy and not unlocked as in 2600k. also its socket will be 2011 and not 1155.. LGA 2011 boards as of release will be minimum 15k and above.. Ivy bridge will be 1155 but it is expected to release around may next year.. so a pretty long wait for you if you want a ivy..

So unless you are prepared to shell out minimum 30k+ for just mobo and proccy then waiting is a useless prospect..

Now do you understand why BD is competitive... 2600k will be the best till next year ivy for mid range..

I have explained the OS problem while handling BD modules. It is not because saturated Frontend but because of the inability of OS to not understanding the module based design and two cores inside it. OS sees it at 8 separate cores. Here is the link of my previous post: *www.thinkdigit.com/forum/cpu-motherboards/135848-amd-bulldozer-news-discussion-27.html#post1508789
 

vickybat

I am the night...I am...
dude i referred to sb-e more for a timeline... and to tell some people who said wait for sb-e and ivy to upgrade..

as u said sb-e is enthusiasts proccy.. so will be premium priced.. the next proccy in the 2600k range will be ivy... follow me...

No , the next cpu in the 2600k line up is going to be 2700k and its going to launch next month. Not all sandybridge -E cpu's are premium priced. The entry level ones will also beat 2600k. For eg. i7 3820 is going to cost $294 and seems affordable imo.

now ivy will be released by may here.. so till next year 2600k is king.. BD's performance is comparable to sandy.. so the next BD will be almost equal to 2600k releasing by jan or feb. so till ivy releases amd will have a proccy to compete with ivy..

where this is all leading to is that BD is quite a good bet

Its leading nowhere. You cannot predict a processor's performance out of thin air speculations.BD's performance is not comparable to sandy and its mainstream cpu's are a complete failure. Next bulldozer not only has to compete with 2600 and 2700k but also 3820. Things really look bad in amd's perspective now. Pile driver has to be a complete evolution in order to turn things away.

@vicky-- dude the architecture for BD is tricky to understand.. here is something that will make my argument a little clear

from semiaccurate..



So even though the FP scheduler is shared there are two 128bit FMAC cores as opposed to one for every core in the conventional core topology.. When you disable one core in a module the other core will have two 128 bit FMAC's instead of one as is the convention.. Hence my statement ' One core plus an additional fpu'

Now do you understand??

First and foremost, you cannot compare different architectures and predict performance. In case of bulldozer, it has two 128bit FMAC execution units. I won't call it a core but a part of it. If you see sandybridge's floating point execution unit per core, it will look something like the following:

*i53.tinypic.com/2cwrbjn.jpg


Now you see sandybridge's execution cluster. Floating point multiplication, addition and boolean are performed by different execution units in a single cluster. Now this part of a core wheres athe other part has an integer unit. Each sandybridge core has a single frontend whereas each bulldozer module has a single front end.

Besides amd sees avx in a different perspective than intel. Bulldozer has two 128 bit sse (fmac unit) which will be combined for 256 bit avx operations. But in sandybridge, it allows two 256bit avx operations in a single clock cycle thus has twice FP throughput.

So there are other flaws in bulldozer than simply the architecture. Although the modular architecture is good , it wasn't implemented by amd properly and thus has flaws.
 
Last edited:

mrcool63

Journeyman
No , the next cpu in the 2600k line up is going to be 2700k and its going to launch next month. Not all sandybridge -E cpu's are premium priced. The entry level ones will also beat 2600k. For eg. i7 3820 is going to cost $294 and seems affordable imo.

Once again read my earlier post.. i7 3820 along with a lga2011 mobo will be beyond normal people.. i7 will be with a locked multiplier.. not as easy to overclock as the sandy but can be done no doubt.. the kicker will be the x79 chipset boards for the 2011.. they are now hefttily expensive going above 15k for the normal ones.. also pcie3 support is still doubtful...if you still doubt the x79 prices check it in google...

Its leading nowhere. You cannot predict a processor's performance out of thin air speculations.BD's performance is not comparable to sandy and its mainstream cpu's are a complete failure. Next bulldozer not only has to compete with 2600 and 2700k but also 3820. Things really look bad in amd's perspective now. Expect pile driver to a complete evolution in order to turn things away.

2700 is a 2600k with a higher clock.. everybody knows that it was proposed to be released to counter BD!!! 3820 is what i am talking about above.. You are saying that a person who feels BD at 12k is expensive will invest in a 3820 at 17k with a x79 mobo at 16k or more and call it value for money?? how are you predicting 3820's performance?

First and foremost, you cannot compare different architectures. In case of bulldozer, it has two 128bit FMAC execution units. I won't call it a core but a part of it. If you see sandybridge's floating point execution unit per core, it will look something like the following:

yeah dude spot on there... those are all the reasons why BD is bad in single threaded apps:) My say was that instead of overloading and potentially bottlenecking the front end and the FPU by using two cores with 8 integer pipelines, running one core with 4 integer pipelines with the existing FPU will decrease the bottleneck and give upto 20 percent more performance gains.

Now do you follow my drift:)
 

vickybat

I am the night...I am...
Once again read my earlier post.. i7 3820 along with a lga2011 mobo will be beyond normal people.. i7 will be with a locked multiplier.. not as easy to overclock as the sandy but can be done no doubt.. the kicker will be the x79 chipset boards for the 2011.. they are now hefttily expensive going above 15k for the normal ones.. also pcie3 support is still doubtful...if you still doubt the x79 prices check it in google...

Never denied x79 to be cheap and why should it be? Bulldozer is not a competition for x79 parts. Sandy-E is in a league of its own.

700 is a 2600k with a higher clock.. everybody knows that it was proposed to be released to counter BD!!! 3820 is what i am talking about above.. You are saying that a person who feels BD at 12k is expensive will invest in a 3820 at 17k with a x79 mobo at 16k or more and call it value for money?? how are you predicting 3820's performance?

Hmmm you can say that it was released to counter BD but now its not even required after seeing bulldozer's performance. 2700k will demolish 8150 because you've no idea how sandybridge chips respond to even slightest clock increments. See the [performance difference between i5 2500 and 2400 and you'll know.

A person whose dead-set to get x79 parts lets say i7 3820+x79 mobo won't even look at 1155 based 2xxx cpu's. Forget bulldozer. Its not even in contention.

People will rather invest in 1155 than current bulldozer at every pricepoint. Heck previous gen amd x4 and x6 cpu's makes greater sense.


yeah dude spot on there... those are all the reasons why BD is bad in single threaded apps:) My say was that instead of overloading and potentially bottlenecking the front end and the FPU by using two cores with 8 integer pipelines, running one core with 4 integer pipelines with the existing FPU will decrease the bottleneck and give upto 20 percent more performance gains.

Now do you follow my drift:)

Nope, if you remove the additional integer core from a module, then performance will further drop down. It will no longer be called a a module anymore.

The problem is in the scheduling logic in current operating systems which cannot distinguish modules from cores. Cilus was trying to convey the same.
Windows 8 will try to address this as said by its lead designer "ARUN KISHAN".

Dependent instructions from separate threads will be assigned to single bulldozer module by the shared frontend to fully utilize the architecture and windows7 is unable to do that now.
 
OP
Cilus

Cilus

laborare est orare
Vicky is right here. The OS limitation is also stopping Bulldozer to use its Turbo Core feature to maximize the performance in lightly threaded environment.
For example consider a Thread T1 which is quite big and need 5 CPU time slice or 5 iterations (switching back to main memory after each time slice and then again assigning it to CPU. Known as Context Switching and handled by OS)to be completed. Now Say Bulldozer have two mdoules M1 and M2 and 4 cores C1, C2 belongs to M1 and C3, C4 belongs to M2.

Now for enabling Turbo-core, in each of iteration, T1 needs to be assigned to a single module, say in M1, to disable the module M2 completely and increase the clock speed of M1 to fasten the execution.

Since, Windows 7 is simply not aware of Bulldozer module and it sees a Dual module chip as 4 independent cores, namely C1, C2, C3 and C4, it may assign the thread T1 to any of those cores. Now for the 1st two iterations, by luck, T1 is assigned to C1 and C2, belongs to same module, resulting enabling Turbo core for these two iterations. But in the 3rd, 4th and 5th iterations, say Win 7 Assigned them to either C3 or C4, resulting Turbo-Core to be disabled as both the modules are now working on a single thread but not concurrently.
 

sukesh1090

Adam young
^^but still there is no sign of power consumption fixes from AMD.we may get a performance boost in win8.this os problem could be the reason why BD is showing inconsistent performance.once it goes above all the processor but in some test it catches dirt.
 

vickybat

I am the night...I am...
^^so vicky how much performance increase we will see in win 8 with BD?

I would say a 10-20% increment in light to heavily threaded apps. But sandybridge and all its successors are also going to benefit from windows 8.

For this reason, i would say people to give bulldozer a try. Besides, you also have the newer instruction sets at your disposal.
A normal user won't see any performance deficit in real life scenario. I don't think fx 4100 and 6100 are power hungry. They are 95 watt cpu's right?

Vicky is right here. The OS limitation is also stopping Bulldozer to use its Turbo Core feature to maximize the performance in lightly threaded environment.
For example consider a Thread T1 which is quite big and need 5 CPU time slice or 5 iterations (switching back to main memory after each time slice and then again assigning it to CPU. Known as Context Switching and handled by OS)to be completed. Now Say Bulldozer have two mdoules M1 and M2 and 4 cores C1, C2 belongs to M1 and C3, C4 belongs to M2.

Now for enabling Turbo-core, in each of iteration, T1 needs to be assigned to a single module, say in M1, to disable the module M2 completely and increase the clock speed of M1 to fasten the execution.

Since, Windows 7 is simply not aware of Bulldozer module and it sees a Dual module chip as 4 independent cores, namely C1, C2, C3 and C4, it may assign the thread T1 to any of those cores. Now for the 1st two iterations, by luck, T1 is assigned to C1 and C2, belongs to same module, resulting enabling Turbo core for these two iterations. But in the 3rd, 4th and 5th iterations, say Win 7 Assigned them to either C3 or C4, resulting Turbo-Core to be disabled as both the modules are now working on a single thread but not concurrently.

Yup this explanation is perfect buddy.Context switching is the right term and like you said, turbocore is having limitations because the modules are getting worked on unnecessarily and cannot be disabled to bump clock speed. One of the reasons of bulldozer's current dismal performance.

But i see hope in windows 8.
 

Skud

Super Moderator
Staff member
Problem is that Piledriver, which is expected to release in Q1 2012, is supposedly the last upgrade that AM3+ will receive as AMD is apparently moving towards a newer socket. If that is true then even Piledriver would be without the support of Win 8 for a good 7-8 months. So committing to a platform, which will get CPU upgrades for less than a year and no optimum OS support for a similar period, doesn't seem wise at this moment. By the time Win 8 will come out, we will most probably be talking about some other processors.

Hopefully, BD will do and sell well in server market.
 

vickybat

I am the night...I am...
^^Yup you are right mate. Amd's next socket is FM2 and will unify cpu and apu to a common platform. I guess pile-driver will feature komodo cores (5 modules) and will get the FM2 treatment along with trinity.

Windows 8 is not that far off. But intel will constantly make life harder for amd by launching more and more cpu's. Believe it or not, they are even eyeing trinity with their ivybridge gpu. They say its a big leap from sandybridge's gpu. Must say that igp performance will never be the same again and the future truly is fusion.:smile:
 
Top Bottom