AMD Bulldozer News and Discussion

Extreme Gamer

僕はガンダム!
Vendor
Re: Hardware price list/spec sheet

@Cilus: I didnt mean HT per se. I meant that it wont arrange the threads properly.
 
OP
Cilus

Cilus

laborare est orare
Re: Hardware price list/spec sheet

@Cilus: I didnt mean HT per se. I meant that it wont arrange the threads properly.

If you didn't mean then why did you write that Bulldozer cores are treated as HT cores? You should have said what you are saying now.
If a processor is not HT enabled they cannot be treated as HT core and sabdly Bulldozer is not at all any Ht enabled Processor.
 

vaibhav23

In the zone
If you want your name in the guinness book then buy a bulldozer.:grin:
AMD Bulldozer Speed Record Broken Again at 8.58GHz
 

Extreme Gamer

僕はガンダム!
Vendor
Re: Hardware price list/spec sheet

because in windows 7 the result is similar. If you have a Ci7 HT enabled, it will not put two threads per core, but use HT as the last resort for active threads. Same case here- it will send the threads through different modules instead of keeping the threads of one app together.

Windows 7 doesnt recognize 1 module as two fully operational cores.

The story changes in windows 8.
 

vickybat

I am the night...I am...
Re: Hardware price list/spec sheet

because in windows 7 the result is similar. If you have a Ci7 HT enabled, it will not put two threads per core, but use HT as the last resort for active threads. Same case here- it will send the threads through different modules instead of keeping the threads of one app together.

Windows 7 doesnt recognize 1 module as two fully operational cores.


The story changes in windows 8.

Can you explain the bold part? The way i see it, windows 7 sees it as full 8 operational cores. That's where the flaw lies.
 

vickybat

I am the night...I am...
Re: Hardware price list/spec sheet

Windows 7 sees 8 logical cores, not physical cores.
Since it does not have the info on how to handle a BD module, it tries to operate it like a hyperthreaded core, by placing only 1 thread per module and if the situation of more threads arises, instead of putting two threads per module it will put one and then as all the modules have 1 thread each, it will assign another thread to one module.
So if one app needs more than one thread, and all the modules already have 1 thread each, the threads will be spread across different modules, which results in poor resource management.

The bold part is completely haphazard. Why the heck will an app need more threads? Do you know what a thread is and what do they contain??
 
OP
Cilus

Cilus

laborare est orare
Re: Hardware price list/spec sheet

Windows 7 sees 8 logical cores, not physical cores.
Since it does not have the info on how to handle a BD module, it tries to operate it like a hyperthreaded core, by placing only 1 thread per module and if the situation of more threads arises, instead of putting two threads per module it will put one and then as all the modules have 1 thread each, it will assign another thread to one module.
So if one app needs more than one thread, and all the modules already have 1 thread each, the threads will be spread across different modules, which results in poor resource management.

What the Hell.... Windows 7 sees all the Bulldozer cores as separate Physical cores, not 8 logical cores and AMD is marketing Bulldozer as 8 core processor, not 8 logical cores. Windows 7 is not even aware that there is something like module in Bulldozer CPU.
Buddy use a little common sense, AMD cannot market Bulldozer as 8 core processor if it is treated as 8 logical cores. You need to have 8 physical cores to sale a product highlighted as 8 core CPU.

What Vickybat is saying is perfectly right, the problem lies of assigning available threads to any of the 8 cores available, irrespective of which module they belongs to. Buddy, read a little before posting wrong information.
Microsoft is also confirmed that issue and it is completely opposite what you've said(Win 7 cannot recognise 1 module as two separate functional unit). In fact Win 7 sees two separate cphysical ores instead of a module.
Now the thing is Bulldozer's two cores inside a module are not exactly independent as they share the FPU and Fetch/Decode hardwar (known as Frontend of the CPU) and only integer units are independent. So placing two interrelated threads inside a module will increase better resource sharing and high degree of ILP which Win 7 simply can't do as it is not aware of Module and places those two interrelated threads to any two of the available 8 cores. It also restricts Turbo boost to fire in its full potential as if a thread needs multiple CPU times to be executed completely, Win 7 assigns it to any cores rather than the cores inside the module. So multiple modules are getting used for only 1 thread, resulting all the modules being used ineffectively and restricting cutting power from the other Modules and turbo boosting only one module to the highest level for faster processing.

If a module can be treated as two logical cores belongs to one Physical Module, then the assignment will be far better. So what you are stating as problem, Windows 8 is gonna have the same logic to resolve the issue.

A quote from Anandtech:

AMD also shared with us that Windows 7 isn't really all that optimized for Bulldozer. Given AMD's unique multi-core module architecture, the OS scheduler needs to know when to place threads on a single module (with shared caches) vs. on separate modules with dedicated caches
Read HERE and HERE
 

max_snyper

Maximum Effort!!!!!!
Re: Hardware price list/spec sheet

^^ even if the updates are provided to win 7 for recognizing 8 separate physical cores as a 4-module based architecture that could also do the trick...for thread handling....!
as a quad core cpu...!
i still think if processes are handled by the win 7,as if on the quad core cpu..performance could increase.
 
Last edited:

Extreme Gamer

僕はガンダム!
Vendor
Re: Hardware price list/spec sheet

@Vickybat: A thread is a sequence if instructions in a program/sub-system that may/may not run in parallel to another thread.

Ever heard of multithreading? ;)

Let me clarify:

You have three applications running. Each is running in a separate module. Now you launch a program that can multi-thread (i.e. run more than 1 thread in parallel), and can use two cores.
Obviously, its two threads will be spread across two modules, because Windows 7 wont recognize 2 physical cores in the processor, and treat one module like a hyperthreaded core.

However, if windows 7 knew how to handle a BD module (which it should in the coming months because microsoft will patch Windows 7), both threads of the application/program would've been kept in the same module and more efficient use of memory would take place.

@Cilus:
Windows 7 sees all the Bulldozer cores as separate Physical cores

then according to you Windows 7 also sees a core i7 having 8/12 physical cores.

Windows 7 is not even aware that there is something like module in Bulldozer CPU.

Exactly, it does not have the information to handle a BD module.

Thread assignment in windows 7 for BD takes place in the same manner as it would in a Core i7, causing inefficient cache utilization in a module.
 

vickybat

I am the night...I am...
Re: Hardware price list/spec sheet

@Vickybat: A thread is a sequence if instructions in a program/sub-system that may/may not run in parallel to another thread.

Ever heard of multithreading? ;)

Oh no i'm hearing these terms first time from you.:lol: Sorry but that bold comment really made me laugh.

I suggest you to read the basics first before coming here and posting irrelevant info. Do you know the difference between SMT and Super threading? Its a very common misconception and i definitely guess you too are a victim.

Every program is termed as process. Since executing a single large process takes a hell lot of cpu time, its divided into multiple threads having different instances or states.

Threads contain instructions which are stateless. But instructions can be dependent and in the real world, this has to happen. Cilus's explanation is perfectly correct ( i really owe him for clearing a lot of my misconceptions).

Let me explain a bit:

You see bulldozer has a total of 8 physical integer execution units but four floating point units. Usually, the integer units are regarded the main portion of a cpu core because a cpu stresses more on integer calculations than fixed point operations.

As cilus pointed, Each bulldozer module has a shared front-end consisting of branch prediction , fetch ,decode and despatcher units ( roughly). These are responsible for assigning threads and in turn instructions to the execution units whether integer or float. Lets stress on integer operations from now on and ignore float operations.

Suppose there are two threads T1 and T2.

T1 thread as 4 instructions which are A,B,C & D.

T2 thread as 3 instructions which are E, F & G.

Lets say these threads belong to the same process and thus have dependent instructions in them.

Say, A& G are dependent.

Now consider a 4 module bulldozer cpu having MOD1 , MOD2, MOD3 ,MOD4.

Now OS along with the cpu's own prediction logic assigns thread T1 to MOD1 and T2 to MOD 3.

Lets say, the 1st integer core starts executing the instructions one by one from A TO D. When it comes to D , it cant be executed because its dependent on instruction G of T2 which has been assigned to MOD 3 and not yet been processed. It has to wait until MOD3 finishes executing E,F& G and then send information G TO MOD1 so that instruction D can be processed.

But if the OS is aware of modular architecture its prediction logic will allow T1 & T2 to be assigned to MOD1. T1 will be handled by the integer core 1 and T2 by integer core 2 of MOD1. Thus all dependent instructions can exchange info in the singe module and thus result in faster execution. Here instruction level parallelism is employed along with TLP as they are dependent on each other.

Turbo boost will also work because all other modules are off and a single module i.e MOD1 can achieve highest boost frequency whereas in the first case it wouldn't achieve that much boost because MOD3 was active and two modules will be cut instead of three in the optimal case.

Windows 8 will address this.


@Cilus:


then according to you Windows 7 also sees a core i7 having 8/12 physical cores.

This part is absolutely absurd. I have no idea why in the blue hell you're saying this.

You're welcome ;-)

Bought a Corsair GS600 @ 4.1k From MD Computers on first day of this month ;-)

Congrats man.:smile:
 
Last edited:

Extreme Gamer

僕はガンダム!
Vendor
Re: Hardware price list/spec sheet

The multithreading question was a joke and i'm glad you laughed :D
I sure as hell hope that you knew it was a joke.

Superthreading is pretty similar to what hapens when a hyperthreaded core is assigned two threads, unrelated or otherwise.
Both threads wont undergo simultaneous execution. Exactly opposite in Simultaneous multithreading.

I know BD has only 4 FP units and 8 integer and logic units, but what I was trying to say is that the end result of thread assignment is similar to hyperthreading.

Computers have been my ******* since 1999 lol. Studying hard to get into a good college for pure CS with video game development.
 
Last edited:
OP
Cilus

Cilus

laborare est orare
Re: Hardware price list/spec sheet

then according to you Windows 7 also sees a core i7 having 8/12 physical cores.

There is book called Advanced Computer Architecture by Hennessy patterson. I suggest you start reading it.

in a HT processor, a single core can handle instructions from two different threads simultaneously. Ever heard of a term Pipe-lining? If yes then let me tell you it the single core uses Pipelining to execute multiple instructions in parallel by overlapping their execution time. They don't have separate execution unit to execute multiple instructions simultaneously. Any HT enabled processors like i7 2600 or i7 920 uses the same methodology.

In Bulldozer module it does have two separate Integer Execution unit and they can process instructions in parallel irrespective what the other one is doing. So it is treated as two separate cores as two separate execution units are available. Here the advantage is two separate execution cores are having single Fetch/Dispatch unit and they can start processing even instructions from a single thread in parallel, a feature not possible in HT enabled processor due to the presence of a single execution unit. So if Windows 8 identify a module as an unit and assign a single thread to it properly, its integer instructions can be shared by the two Integer cores and Floating point instructions can be put in pipeline for the FPU to enhance ILP.

Currently all the OS, starting from Windows XP or 98 can identify a HT enabled CPu and differentiate between logical cores and physical cores. So it can manage the thread assignment accordingly. So a 4 core HT enabled is treated as 8 logocal cores and 4 physical cores. In Bulldozer, there is nothing called HT enabled unit and module can be considered roughly equivalent to it. But again I am saying it: Windows 7 does not know the existence of Module, it sees it as 8 separate physical cores and assign works in random way to the cores. Just check the links I've posted in my previous post.

Now even after explaining everything I have an intuition that you're gonna again post some baseless comment without even having a look what I've posted or the links I've shared. So I suggest you share your opinion with Arun Kishan.
P.S. Read the spoiler before asking him.
Arun Kishan is the person who who owns process management and threading subsystem in Windows and actually has pointed out these issues with Windows 7 Thread management with Bulldozer.
 

vickybat

I am the night...I am...
Re: Hardware price list/spec sheet

@ extremegamer

I suggest what cilus said and start reading a bit on these to gain some valuable knowledge. Posting irrelevant and misleading info is termed as spamming and nothing more.
Its bad for all people getting involved with the material and in the reference of this forum, materials are nothing but posts.

You need to go deep into understanding and not just learn some terms and their definitions. Also go through the links which other forum members provide before starting baseless arguments. Its bad for the entire community (tdf universe).
 

d6bmg

BMG ftw!!
Re: Hardware price list/spec sheet

OT but now on topic: Best updated book for computer hardware: Computer Architecture, Fifth Edition: A Quantitative Approach Link

Anyone who is interested can buy that.
 

Extreme Gamer

僕はガンダム!
Vendor
Re: Hardware price list/spec sheet

@ extremegamer

I suggest what cilus said and start reading a bit on these to gain some valuable knowledge. Posting irrelevant and misleading info is termed as spamming and nothing more.
Its bad for all people getting involved with the material and in the reference of this forum, materials are nothing but posts.

You need to go deep into understanding and not just learn some terms and their definitions. Also go through the links which other forum members provide before starting baseless arguments. Its bad for the entire community (tdf universe).
I totally abhor definitions, and always try to understand the fundamental concepts. Without knowing about my daily life, I do not think that you should assume that I learn terms/ definitions.

Obviously bulldozer arch is not HT and the steps involved in thread assignment vary. But unlike the Core i series arch where the L3 cache is shared by all cores, in BD each module has its own Cache. Windows 7 will not see a module yet, but it does see that two cores have shared cache. So, while assigning threads to idle cores, it spreads the threads across different modules [obviously] unwittingly, where the END RESULT, I repeat, END RESULT of thread assignment is the same as an HT core i series processor i.e. threads are assigned to alternate cores and not adjacent ones.

This is what I have been trying to say all along.

When I said that windows recognizes 8 logical cores, I meant that it was Windows's fault and not AMD's. However, I take that part back because after further reading up on the arch of BD, I found my mistake there.
 
OP
Cilus

Cilus

laborare est orare
Re: Hardware price list/spec sheet

But unlike the Core i series arch where the L3 cache is shared by all cores, in BD each module has its own Cache.

Bulldozer does habe shared L3 cache of 8 MB which is shared by all the modules or 8 cores, just like the L3 cache of Sandybridge. I think you're trying to point out the 2 MB L2 cache of a module which is shared by both the cores (integer) of that module.

But thread assignment is not done by checking the cache type in use, not even in the HT processors...there is no such thing that if multiple core shares a single Cache, they will be treated as HT cores. Cache management logic is present inside the CPU die and it cannot be directly controlled by OS. Every current gen processor has dedicated cache as well as Shared cache and they are internally handled by hardware logic prsent inside the CPU, not by OS. Cache management inside CPU is a Blackbox view to the OS.

Buddy, why don't you check the links we've provided? Check what Microsoft has said about the thread handling of Windows 7 in Bulldozer module. Here all the cores are treated as 8 separate cores, not any interrelated cores and the thread assignment is random.

If you wanna stick to your understanding then provide us some links where experts have analysed and found your's explanation.
 

vickybat

I am the night...I am...
Re: Hardware price list/spec sheet

I totally abhor definitions, and always try to understand the fundamental concepts. Without knowing about my daily life, I do not think that you should assume that I learn terms/ definitions.

It isn't me saying so but your posts seem to speak for themselves.

Obviously bulldozer arch is not HT and the steps involved in thread assignment vary. But unlike the Core i series arch where the L3 cache is shared by all cores, in BD each module has its own Cache. Windows 7 will not see a module yet, but it does see that two cores have shared cache. So, while assigning threads to idle cores, it spreads the threads across different modules [obviously] unwittingly, where the END RESULT, I repeat, END RESULT of thread assignment is the same as an HT core i series processor i.e. threads are assigned to alternate cores and not adjacent ones.

This is what I have been trying to say all along.

If possible, take a long break and analyze your own comment. Try and relate each and every sentence that you wrote. Are they falling in place? Not so to me.

From your own comment-" But unlike the Core i series arch where the L3 cache is shared by all cores, in BD each module has its own Cache."
what the heck does this line mean?

We all know that both sandybridge and bulldozer have dedicated L1 ( data and instruction cache), L2 cache per core ( module for bulldozer) and a large shared L3 cache.

So how the heck its related to hyperthreading or multithreading at all??

Like cilus said cache management is not decided by os but by internal hardware logic which is not known. You have completely misunderstood the very basic concepts and that's why you post gibberish every now and then.

The most important job for cache is to significantly reduce cpu read & write times by accessing cache memory internal to it rather than main memory to reduce latency as cpu cycles will be lost doing so. Thread assignment to execution units has got nothing to do with any type of cache at all. Cache memory is simply mapped to main memory either directly or in a single or bidirectional associative manner. The most used memory indexes are written over to cache index so that cpu can quickly fetch data , process and write back.

Read properly and come back again to post in a relevant manner.
 
Last edited:

Extreme Gamer

僕はガンダム!
Vendor
Re: Hardware price list/spec sheet

The L3 cache is shared 2MB/module...

*upload.wikimedia.org/wikipedia/commons/e/ec/AMD_Bulldozer_block_diagram_%288_core_CPU%29.PNG

From your own comment-" But unlike the Core i series arch where the L3 cache is shared by all cores, in BD each module has its own Cache."
what the heck does this line mean?

I was referring to L3 cache only...

But thread assignment is not done by checking the cache type in use, not even in the HT processors...there is no such thing that if multiple core shares a single Cache, they will be treated as HT cores. Cache management logic is present inside the CPU die and it cannot be directly controlled by OS. Every current gen processor has dedicated cache as well as Shared cache and they are internally handled by hardware logic prsent inside the CPU, not by OS. Cache management inside CPU is a Blackbox view to the OS.

Cilus ok, then maybe I understood it wrong. But I did not say windows manages cache.

My whole concept will be clear if you answer this:

can windows see the cache (not saying it can manage it, never did...)?

So how the heck its related to hyperthreading or multithreading at all??

They are NOT treated as HT cores!! The END RESULT of THREAD ASSIGNMENT is the SAME!!
 
Last edited:

vickybat

I am the night...I am...
Re: Hardware price list/spec sheet

^^ Windows will show it as 8mb shared and not 2mb per module. They are shared by an internal interconnect and lie outside of the module. You can see it clearly given in the block diagram.

Your concepts will be clear if you read an article cilus provided on threading and reading the bulldozer article carefully ( i provided one in the bulldozer thread). Read them properly and have your doubts cleared.
 
Top Bottom