AMD Bulldozer News and Discussion

Krow

Crowman
Bulldozer sounds mythical to be honest. I expect it to be a notch below the i7. And AMD will improve it with revisions.
 

tkin

Back to school!!
On a basis of single core performance, BD won't be able to touch intel, but with multicore it may pose a healthy competition for 2500k or lower, 2600k will remain on top due to hyperthreading(its gives a monster boost).
 

vickybat

I am the night...I am...
I recon bulldozer's single threaded performance to be higher than 1155 sandybridge cpu's owing to the shared frontend which dispatches dependent instructions simultaneously to two execution units at once.

A thread having lot of dependent instructions will favor bulldozer's modular architecture and its shared frontend.

AMD is utilizing ILP here and not only TLP because not only independent instructions will be executed but the dependent instructions will too be executed in the single module owing to its shared resources. This is very different from hyperthreading that intel employs as it doesn't have a shared frontend like bulldozer. A conventional dual core cpu as individual fetch and decode units for a single core and a dual core bulldozer has a single module with shared fetch and decode. So it can assign multiple instructions from multiple threads to its two execution units whereas in a normal dual core , the frontend of one hyperthreaded core takes up instructions from a single thread. But if there are dependencies with a second thread ,it has to wait until instructions are executed from 2nd thread. Remember they have a single execution unit percore (both integer and floating point units to be considered as one).

Had a long chat with CILUS as he had prepared a terrific article on threading which explains SMT (simultaneous multithreading) and hyperthreading in detail. He explained me the whole thing and will be providing the content to the entire TDF universe.

After reading that, lots of doubts will be cleared on the threading concept including single and multithreading performance of a conventional cpu.
 
Last edited:

sukesh1090

Adam young
@vickybat,
then what say will BD beats i7 2600K or not?
waiting for the article.way to get some knowledge which i don't have much.
 

Krow

Crowman
TBH waiting is for nobs. Get what is best now, i.e., Intel. :p Will think about bulldozer when it releases. That said, I hope AMD gives Intel good competition here.
 
OP
Cilus

Cilus

laborare est orare
I'm currently doing the last minute touch to my article where I've explained everything about multi threading...from the single threaded single module to a Hyper-threaded module, about operating system with block diagrams and examples. But it is huge, you guys need to be little patient to read it.

But still some words about HT and Bulldozer cores:-

Actually 1 module of Bulldozer will beat one Hyperthreaded core of Intel i7 but if you consider 1 module as two cores then each of the core's performance is not as good as Intel Sandybridge hyper-threaded core.
There is a misconception that Hyper-threading is TLP, not ILP, which is actually wrong. Holding two threads inside CPU thread registers and switching to one when the other one is busy or waiting for resources...which is the basic concept of Hyperthreading...is actually a misconception. The above mentioned technique is called Superthreading, not Hyper-threading. Hyper-threading improves the Thread level Parallelism by improving ILP. In HT, if the CPU fronted (fetch, decode, Out of Order Logic) can issue 4 instructions and suppose the current thread can issue only two instructions, then a HT enabled processor can issue another two instructions from the 2nd thread. So total 4 instructions will be executed by CPU at any particular cycle where two instructions are coming from Thread 1 and another two from Thread 2. Here some of the instructions from both the threads are being executed simultaneously. In contrast, in Superthreading (which is wrongly thought as Hyper-threading by most of us) can only issue instructions from one of two threads in any particular cycle, not from both. But this is a best case scenario where both the threads are completely independent to each other. HT performance decreases if dependency is present among the threads since it has to wait for the completion of the independent thread to process the dependent one and both of their instructions can't be processed because CPU execution unit is one.

Consider the following example:
Thread 1 or T1: Instructions I1, I2, I3, I4 and I5 are present
Thread 2 or T2: Instructions I6, I7, I8, I9 and I10 are present.
Thread 3: It is in the thread queue waiting to be picked up after the completion of any of the threads T1 and T2. It has instructions I11, 12, I13, I14 and I15. All are independent.

Dependencies: I1 and I2 are independent, I4 is dependent upon I6, i5 is dependent upon both I4 and I10.
I6 is independent, i7 is dependent upon i2, I8 is dependent upon i5. I9 and I10 are independent

Case 1: one Hyper-threaded Core of i7 core with two logical units:

In a HT processor which can issue 4 instructions in a single cycle, I1, I2, I6 and I9 can be issued in the 1st Cycle as they are independent. Now in the 2nd Cycle only I4 (I6 is completed in 1st Cycle), I7(I2 is completed) and I10 can be executed as all the remaining are dependent. So one instruction issue logic is getting wasted.
Cycle 3, only instructions I5(I4 & I10 is done) can be issued as I8 requires I5 to be finished. So three instruction resource is getting wasted. Only in Cycle 4, I8 can be issued. To total 4 cycles are required. At the start of 5th cycle T3 will be fetched. For finishing T3, two more cycles will be required as it has five instructions. So total number of cycles to finish three threads is 7.

Case 2: One Bulldozer module with two cores and shared Frontend
Now Consider a Bulldozer module where two separate execution units are available and they are but the two threads inside a module are shared among all the execution units inside the module which is two here. Consider it has Core 1 and Core 2 which can share all the data, threads and instructions present inside the module.
So in the 1st Cycle Core1 will have I1, I2, I6 and I9 and Core2 will have I10.
In the 2nd Cycle, Core1 will have I4 and I5 and Core2 will have I7 and I8 loaded but waiting for resource.
Now lets devide 2nd Cycle into two different time frame. let's consider 2nd Cycle time span = t1 (timespan to execute 1st instruction) + t2 (timespan to execute 2nd instruction). So both t1 and t2 are less that Clock cycle 2's total time period.
Now at the end of t1, Core1 will finish I4 (I6 is done in Cycle 1) and Core2 will finish I7 (I2 is done is 1st Cycle). At the end of t2, I5 will be executed as I4 and I10 are done and I8 is still in Core2 execution unit
So Now Core 1 is completely free and Core2 has 1 instruction left. So at the beginning of Cycle 3, T3 is fetched and I11, 12, I13, I14 of it will be assigned to Core1 and I15 will be assigned to Core2 as it has still 3 empty slots.
Now at the end of 3rd clock cycle T1, T2 and T3, all are completed.

So advantage over a Ht core is 4 cycles.
 
Last edited:
J

Joker

Guest
On a basis of single core performance, BD won't be able to touch intel, but with multicore it may pose a healthy competition for 2500k or lower, 2600k will remain on top due to hyperthreading(its gives a monster boost).
no. that's wrong. in synthetic benchmarks it might give "monster boost" but otherwise hyperthreading is only 20-25% boost. 2 intel cores will be 70-80% faster than 1 hyperthreaded intel core.
 

max_snyper

Maximum Effort!!!!!!
nice explanation Cilus good job.....
So accordingly.....as compared to i7 2600k a bulldozer 8 core will be therotically somewhat 25~30% faster in multithreaded application.....mainly it will help in rendering animation,games....as now a days games are silly ported from consoles.
 
OP
Cilus

Cilus

laborare est orare
There is a chance that it may perform well in games which are not that much multi-threaded, too due to its dual core module design. Here even if a single thread is available to a module, all the instructions of it can be issued to both of the cores of the module simultaneously to fasten the execution speed.
But these things are hard to speculate until we can see it performing and the latest architecture design of it.
 

max_snyper

Maximum Effort!!!!!!
So long story short...better dayz ahead for AMD
I hope it doesnt turn out to be disaster as the phenom I lineup
Finger crossed...!
 

topgear

Super Moderator
Staff member
^^ but they will be much more higher priced and AMD will win the VFM battle IMO if BD can outperform current gen SB's - but to know for sure we still need to wait for 15 long days ;-)

@ cilus - nice explanation on HT and Bulldozer cores ;-)

BTW, 6 BD cpu is going to hit the market for sure :

FX -8150, FX-8120 (95W and 125W TDP ) ,FX-8100 , FX-6100 and FX-4100

chck this out ;-)
 
Top Bottom