AMD Bulldozer News and Discussion

max_snyper

Maximum Effort!!!!!!
Re: Hardware price list/spec sheet

^^@vickybat...@extremegamer....
first of all caches are not handled by the OS but by the logic design of the cpu itself...!
PERIOD.
secondly if u are interested in processor design so much then buy a book "microprocessors and interfacing" by Douglas Hall....!
all your doubts & theories will be corrected.

And please discuss this somewhere else this is "Hardware price list/spec sheet"
not processor design section!!!!!!!!
what admins are doing i wonder......
 

vickybat

I am the night...I am...
Re: Hardware price list/spec sheet

@ max_snyper

I think you are right buddy. Got a bit carried away but couldn't resist incorrect posts and misleading info.

I will pm some moderators to move these posts to the bulldozer discussion thread.
Thanks for pointing out mate. :smile:

Btw in post # 4844, i clearly stated what you said just now about caches not handled by os. Cilus also said the same thing.
 
OP
Cilus

Cilus

laborare est orare
Re: Hardware price list/spec sheet

Max_Snyper, You're right. Even I was little carried away as you know CPU architecture is my field of interest. This is my last post regarding it. Tomorrow, I'll be moving all the related posts to Bulldozer Discussion thread. BTW, please have a look at my explanation and share your opinion if you wish.


Extreme Gamer; Yes, OS can see the CPU cache, but inside the Cahce how data is placed, in what sequence or if cache is full what line items are needed to be deleted and write back to memory is handled by CPU hardware logic, not by OS. Cache, present inside CPU die is entirely managed by CPU hardware logic which implements different hardware logic for efficiency like LRU algorithm for cache cleaning, Cache coherency problem resoving hardware etc. To OS, Cache operation is a Blackbox view.
I hope you know what blackbox view is, it means in a module you send the requested inputs and you will get the output without interfering or knowing the actual mechanism how the output is produced. In my previous post I've mentioned it, I never mention that OS is not aware of Cache.

And one thing...Thread assignment is never done on the base of Cache handling algorithm.

For example the L2 cache of a core in a HT enabled Nehalem CPU like i7 920/960, is although shared by two hyperthreaded logical cores, it is partitioned statically to keep any one thread from monopolizing all the resources. It is same like Bulldozer where the L3 cache is staically partitioned, 2 MB/per module.

But in Sandybridge, two logical cores of a physical core can access any position of the shared L2 cache as the division is dynamic, decided by the hardware Logic at run time. It is same as the 2 MB L2 cache per module in Bulldozer, shared dynamically by both the cores.

So you see, the shared cache mechanism is completely different in Nehemiah and Sandy-bridge HT processors, but does it poss any challenge for the OS to assign two tightly coupled threads into the two logical cores of a physical one? The answer is no. Because the entire thing is handled by the Cache management logic of the CPU.

Actually one Bulldozer module lies within a True Dual Core module and a SMT enabled mdoule. It does have some independent unit, mainly the Integer execution unit and some shared unit, like the Front-end (Fetch/Decode, OOO) and the FPU.
What you are saying, treating them as one module rather than two separate cores, will come with Windows 8.
Consider the example and please make sure you read it:-
Suppose we have two Thread T and T1 which are tightly coupled. So they share a lot of common resources and interdependent instructions. Now consider a Dual Module Bulldozer, having module M1 with C1 and C2, M2 with C3 and C4.

To windows 7, there is no existence of M1 and M2, it only sees C1, C2 C3 and C4 as 4 independent cores and assign T to C1 and T1 to C4.
Now whenever C1 faces an instruction from T which is dependent upon some instruction(s) of T2, it has to stall it, push it to the waiting queue C4 completes the instructions in T1 and write it back to L3 cache, if it is available. Then C1 has to read that value from L3 cache and process with the dependent instruction.
So it will waste lots of valuable CPU cycles because:

1. L3 cache is far slower than faster L2 cache which is nearer to CPU.

2. C4 does not have any knowledge that execution of the specific instruction from the instruction queue of T1 cache in the 1st place will fasten the execution of T thread in C1, even if it can do it.

3. C1 and C4, as they belongs to the different modules, have different Frontend (Fetch/Decode/Out of Order Execution). As they don't have any knowledge of the state of the other core, their out of Order logic cannot reschedule the instructions accordingly to fasten the execution of both the threads.

4. Although T and T1 are interrelated, they are placed in two different module s. But since they are interrelated, the single threaded performance of each of the module is poor due to loss of valuable CPU cycles. But the worst thing is:-As the threads are using all the CPU resources or both the module, Turbo boost will be disabled to kick up the single threaded performance.

Now Suppose Windows 8 can identify a Bulldozer Module and it treats C1 and C2 as part of M1 and C3 and C4 as part of M2. So C1 and C2 will be treated as not as independent cores, but something between a SMT core and Dual core.

Now OS has full access to each of the threads and by judging their nature, shared resources, operation memory locations or CONTEXT of each of threads, it can easily understand the interrelation between T and T1. So it will assign both of them to module M1 or to C1 and C2. Now look at the improvements:-

1. Common Fetch and Decode: Since both the Threads share resources , i.e. same memory locations, same operands, they need to be fetched only once. As you are minimizing slower memory access, an expensive process for CPU cycles, performance will be improved due to recovery of the wasted CPU cycles.

2. Out of Order Rescheduling of Both the Threads' instruction: As you know it reorders the instructions to increase the number of executions simultaneously. Now it has two thread full of instructions and a lot of them are dependent, out of order execution unit can reschedule them more effectively to increase the parallel execution. In HT enabled cores, OOO does the same thing but here as two discrete Integer execution units are available, the process will be far superior.
In Computer science it is observed that normally OOO logic can fetch 2 parallel instructions from a single Thread in one cycle and normally one core can run 4 instructions in parallel using Pipe-lining.

So here at least for integer, we have maximum of four instructions from T and T1 and two cores. So you can understand the speed of execution.

3. Data Sharing using L2 cache: As C1 and C2 shares dynamically partitioned same L2 cache, data sharing among them using it will be far faster.

4. Turbo Boost: As you can see, here module M2 is not in use, Turbo boost can make it sleep and charge up the Frequency of the M1 to increase processing speed.


Hope you understand now.
 

sukesh1090

Adam young
^^brother now we don't need new socket.let them work on PD give what we are expecting.there are hell lot of things which needs to be fixed.and the one thing they need work on in the chipset department is their memory controller.which sucks like nothing.
 

happy17292

Ambassador of Buzz
i will upgrade my PC this december.
i have phenom II x4 955 in mind.
what will be the price of FX 6100 ??

is there any chance of getting any bulldozer proccy in 7-8k budget??
 

vaibhav23

In the zone
Well the FX6100 is priced at 10k
Check smcinternational.com
Is the FX 4100 worth buying
AMD FX-4100 Quad Core 3.6GHz Bulldozer Processor Review - The AMD FX-4100 CPU - Legit Reviews
At stock cooler it reaches a stable OC of 4.6Ghz and has a 27% improvement on scores in Cinebench 11.5
And in many tests it defeats the i3 2120 but gets defeated in almost all by the a8 3850
So is it worth buying for 6k
 

Skud

Super Moderator
Staff member
For me, if you really want a BD, whether you are upgrading or buying a new system, get any one of the octa core, both FX 6100 & FX 4100 are not good IMO.
 

Skud

Super Moderator
Staff member
And gets beaten by a A8 3850 and takes a beating from Phenom II. And even after OCing, its Cinebench score is still 16% low than Phenom II X4 980 running at stock (almost 1 GHz slower). Now take your pick. ;)
 

Skud

Super Moderator
Staff member
But you are getting much better performance with Phenom II and arguably the best IGP with A8. And in SMC, the A8 is priced just 850 bucks more than FX4100. That's money well spent IMO.

And regarding Phenom II, I think even the lower clocked and lower priced versions will be competitive with FX4100, if not beating it.

Of course, one negative point of A8 is that not many upgrades are forthcoming, whereas BD and Phenom II still has Piledriver to come.
 

sukesh1090

Adam young
@happy17292,
if you are upgrading the processor for the sake of gaming and if you are not upgrading your gfx card then please wait for PD or ivy.your pentium can easily handle that gfx card without any problem.even if you add phenom II or BD your graphics card will bottleneck the processor.
 

Skud

Super Moderator
Staff member
Try Acer.


Guys, posted some tidbits here on Trinity, have a look:-

*www.thinkdigit.com/forum/cpu-motherboards/148272-amd-trinity-apu-discussion.html
 

vaibhav23

In the zone
@piyush Only this is available and I like the performance of this lappy.Quite good for gaming.
Asus X Series X53TA-SX096D Laptop: Flipkart.com: Compare, Review Asus Notebook
@skud Want to know p2 955BE score in cinebench 11.5
But in the 3 games tested in two both i3 and a8 get defeated but in 1 game FX defeats the a8 in lower resolution.I think that FX should be priced lower than smc's price in local market.
 

Skud

Super Moderator
Staff member
Point is there's not enough performance for the money that I am spending. Nor as much features as the APU. And living in the shadow of future performance benefits of Windows 8 simply doesn't cut.

And it doesn't look like a good upgrade option of existing socket AM3+ users.
 
Top Bottom