Guys, one good news for AMD. Just checking the Anandtech review and found out something useful.
Anandtech also tested X264 encoding using a moddified Binary compiled to support AVX and AMD XOP instruction set and found out that in 2nd pass (which is the original pass where video gets encoded originally) FX 8150 is beating out i7 2600K.
This is really interesting because this example shows how an optimized application for AMD Bolldozer architecture can be benefited and has some serious performance boost. Hoping to see some patch releases to optimize Bulldozer's performance.
Check out here. In Windows 8 preview there is a performance improvement ranging from 4% to 10% over Windows 7. The reason given by AMD is that Windows 7 Scheduler is not aware about Bulldozer's module based architecture and places threads wherever it finds a core is free, rather than judging the state of the module. For example suppose at time t Bulldozer has two free modules (that is 4 cores to OS) and there are two threads waiting for OS to schedule them in CPU. Now if OS is not aware of the modules, it may assign both the threads to the cores of a single module, resulting low resource utiliztion, confilts etc as we all know two cores of a module are not totally independent, they share Fetch-Decode, FP unit and L2 cache.
Now if OS is aware of the modules then two threads can be assigned to two different modules. Here each of the threads gets two cores to finish execution of the instructions present in each of them, resulting faster execution.
Similarly it will also help to improve the Turbo core performance. For example if two threads, say Th1 and Th2 where Th2 is dependent upon Th1, are present, they should be assigned to the two cores of a single module to improve resource sharing and to cut down all the other modules to reach at the peak turbo speed when a single module as all the other module are not in used and can be cut from power.
In Windows 7 it is not possible all the time due to OS' inability to recognize the modules.
But if the OS is aware of a module and the cores inside it, those problems can be addressed and scheduling can be done in much organized manner.