A
AuDioFreaK39
Guest
Yep, you guessed it. Another Nehalem article to satisfy your undying needs.
*the new info is in the second half of the overview*
Intel reveals some more Nehalem information at recent Core i7 presentation
Source: *www.overclock3d.net/news.php?/cpu_mainboard/intel_core_i7_presentation/1
Here's a summary of all the IMPORTANT information from the article. I didn't sum this for no reason, have a read:
Something new that Intel is bringing to us with this modular design, shown on the slide to the right, is the "uncore". In short, everything other than the cores and their own cache is in the "uncore", such as the integrated memory controller, QPI links and the shared L3 cache. All of the components in the "uncore" are completely modular and so can be scaled according to the section of the market each chip is being aimed at. Intel can add or remove cores, QPI links, integrated graphics (which Intel say will come in late 2009) and they could even add another integrated memory controller if they so wish.
*www.overclock3d.net/gfx/articles/2008/09/29180420622l.jpg
the L2 cache is a totally new design compared to what we see in the Core 2 CPU's of today...Each core within a Nehalem CPU with have its L1 & L2 cache integrated within the core itself.
*www.overclock3d.net/gfx/articles/2008/09/29183557399l.jpg
The L3 cache that is coming with Nehalem is totally new to Intel, and is also very similar in design to AMD's Phenom CPU's. It is an inclusive cache, which means that ALL of the data residing in the L1 or L2 caches within each core will also reside within the L3 cache.
> achieves better performance
> achieves lower power consumption
Simultaneous MultiThreading
With Nehalem, Intel is a hell of a lot more prepared for the implementation of HyperThreading than it was when it was last present in Intel's Pentium 4 processors. This is largely due to the massive memory bandwidth and larger caches available which aid in getting data to the core faster and more predictably.
Intel are also very happy to use this technology again, and build on it in the future due to its performance /die size ratio. Its performance gain percentage is actually larger than the percentage of die real estate it inhabits. Ronak explained that in general, when implementing technology, they hope for a 1:1 ratio when it came to performance gains vs die area it consumes. He also said that using HT was much more power efficient than adding an entire core.
*www.overclock3d.net/gfx/articles/2008/09/29190301867l.jpg
One other thing I feel I should mention is Ronak explained that higher bandwidth hungry applications may not see a gain from HT at all. This is because the bandwidth is already saturated by the data from 4 cores, and adding more threads could actually be detrimental to performance.
(this really only applies to servers, so don't worry desktop users )
QuickPath Interconnect
*www.overclock3d.net/gfx/articles/2008/09/29230708502l.jpg
One of the reasons it was necessary to move to the QPI was because of the Integrated Memory Controller. The QPI is also a requirement for efficient chip-to-chip communications where one CPU needs to access data that is stored in memory on another CPU's memory controller.
Each QPI link is bi-directional supporting 6.4 GT/s (Giga Transfers) per link. Each link is 2-bytes wide so you get 12.8GB/s of bandwidth per link in each direction which equates to a total of 25.6GB/s of bandwidth on a single QPI link.
(to reword this, the 25.6GB/sec link speed you see is bi-directional, meaning that is the combined speed for data moving to and from the QPI link)
*NEW INFO* - Power Control Unit (PCU)
What Intel revealed recently that rather than using simple algorithms for switching off the power planes of the new Nehalem cores as in previous CPUs, the Core i7 will feature a complete on-die microcontroller called the Power Control Unit (PCU). This chip consists of more than a million transistors which, in comparison is somewhere in the ball park of the transistor count on the Intel 486 microprocessor!
This new controller, which has its own embedded firmware, is responsible for managing the power states of each core on a CPU and takes readings of temperature, current, power and operating system requests..
Each Nehalem core has its own Phase Locked Loop (PLL) which basically means each core can be clocked independently, similarly to AMD’s fairly new Phenom line of processor. Also similar to Phenom is the fact that each core runs off the same core voltage. But this is where I'll stop comparing Nehalem to Phenom as the difference is that Intel have implemented their integrated power gates.
> the PCU shifts control from hardware to embedded firmware
*www.overclock3d.net/gfx/articles/2008/09/30000204147l.jpg
When creating the power gate, Intel's architects had to work very closely with their manufacturing engineers to create a material that would be suitable to act as a barrier between any core and its source of voltage as seen in the slide below. At the time I couldn't really see what the slide was showing, and even now I struggle. However, it's relevant so it should be here.
*www.overclock3d.net/gfx/articles/2008/09/30002743836l.jpg
> individual cores transistion to ~0 power state
> transparent to other cores, platform, software, and VR
essential for Dynamic Speed (turbo mode)??
In a little more detail, the power gates allow any number of cores on a CPU to be working at normal voltages, while any cores idling can have their power shut off completely which in turn reduces idle core power leakage dramatically. It was asked why this hasn't been seen on previous processors and Ronak said that there has never been a suitable and reliable material to use as a gate until now.
Turbo Mode
Turbo Mode is a feature that has been spoken about quite a lot recently, but there have been many mixed claims about just how it works in Nehalem. Whilst it made its debut with mobile Penryn, it never really got a chance to actually work. What it was designed to do was if for instance you had a dual-core mobile Penryn CPU running a single threaded application, leaving one core totally idle, and the chips TDP was lower than what it was designed for, then Turbo Mode would aim to increase the clock speed of the active core. The reason this didn't really work was due to a lot of applications (starting with Vista) bumping the single thread load around active cores leaving them unable to initialise Turbo Mode for any length of time.
In Nehalem, this feature has been refined to work a whole lot better, largely in part to the CPU. The idea is pretty straight forward in that if you have a quad-core CPU and only two of the cores are active, then as long as the CPU detects that the heat levels are ok and the power levels are under the TDP, the two idle cores can be shut down and the two remaining active cores will be overclocked.
Turbo Mode can also come in to effect even if all four cores are active, so long as the CPU detects heat and power levels are under their set limits. In this case all four cores would be given a boost as per the slide below (bottom right). All Nehalem processors will at least be able to go up a single clock step (133MHz) in Turbo mode, even if all cores are active. Just as long as the CPU detects that the TDP hasn’t been exceeded.
*www.overclock3d.net/gfx/articles/2008/09/30223005113l.jpg
*the new info is in the second half of the overview*
Intel reveals some more Nehalem information at recent Core i7 presentation
Source: *www.overclock3d.net/news.php?/cpu_mainboard/intel_core_i7_presentation/1
Here's a summary of all the IMPORTANT information from the article. I didn't sum this for no reason, have a read:
Something new that Intel is bringing to us with this modular design, shown on the slide to the right, is the "uncore". In short, everything other than the cores and their own cache is in the "uncore", such as the integrated memory controller, QPI links and the shared L3 cache. All of the components in the "uncore" are completely modular and so can be scaled according to the section of the market each chip is being aimed at. Intel can add or remove cores, QPI links, integrated graphics (which Intel say will come in late 2009) and they could even add another integrated memory controller if they so wish.
*www.overclock3d.net/gfx/articles/2008/09/29180420622l.jpg
the L2 cache is a totally new design compared to what we see in the Core 2 CPU's of today...Each core within a Nehalem CPU with have its L1 & L2 cache integrated within the core itself.
*www.overclock3d.net/gfx/articles/2008/09/29183557399l.jpg
The L3 cache that is coming with Nehalem is totally new to Intel, and is also very similar in design to AMD's Phenom CPU's. It is an inclusive cache, which means that ALL of the data residing in the L1 or L2 caches within each core will also reside within the L3 cache.
> achieves better performance
> achieves lower power consumption
Simultaneous MultiThreading
With Nehalem, Intel is a hell of a lot more prepared for the implementation of HyperThreading than it was when it was last present in Intel's Pentium 4 processors. This is largely due to the massive memory bandwidth and larger caches available which aid in getting data to the core faster and more predictably.
Intel are also very happy to use this technology again, and build on it in the future due to its performance /die size ratio. Its performance gain percentage is actually larger than the percentage of die real estate it inhabits. Ronak explained that in general, when implementing technology, they hope for a 1:1 ratio when it came to performance gains vs die area it consumes. He also said that using HT was much more power efficient than adding an entire core.
*www.overclock3d.net/gfx/articles/2008/09/29190301867l.jpg
One other thing I feel I should mention is Ronak explained that higher bandwidth hungry applications may not see a gain from HT at all. This is because the bandwidth is already saturated by the data from 4 cores, and adding more threads could actually be detrimental to performance.
(this really only applies to servers, so don't worry desktop users )
QuickPath Interconnect
*www.overclock3d.net/gfx/articles/2008/09/29230708502l.jpg
One of the reasons it was necessary to move to the QPI was because of the Integrated Memory Controller. The QPI is also a requirement for efficient chip-to-chip communications where one CPU needs to access data that is stored in memory on another CPU's memory controller.
Each QPI link is bi-directional supporting 6.4 GT/s (Giga Transfers) per link. Each link is 2-bytes wide so you get 12.8GB/s of bandwidth per link in each direction which equates to a total of 25.6GB/s of bandwidth on a single QPI link.
(to reword this, the 25.6GB/sec link speed you see is bi-directional, meaning that is the combined speed for data moving to and from the QPI link)
*NEW INFO* - Power Control Unit (PCU)
What Intel revealed recently that rather than using simple algorithms for switching off the power planes of the new Nehalem cores as in previous CPUs, the Core i7 will feature a complete on-die microcontroller called the Power Control Unit (PCU). This chip consists of more than a million transistors which, in comparison is somewhere in the ball park of the transistor count on the Intel 486 microprocessor!
This new controller, which has its own embedded firmware, is responsible for managing the power states of each core on a CPU and takes readings of temperature, current, power and operating system requests..
Each Nehalem core has its own Phase Locked Loop (PLL) which basically means each core can be clocked independently, similarly to AMD’s fairly new Phenom line of processor. Also similar to Phenom is the fact that each core runs off the same core voltage. But this is where I'll stop comparing Nehalem to Phenom as the difference is that Intel have implemented their integrated power gates.
> the PCU shifts control from hardware to embedded firmware
*www.overclock3d.net/gfx/articles/2008/09/30000204147l.jpg
When creating the power gate, Intel's architects had to work very closely with their manufacturing engineers to create a material that would be suitable to act as a barrier between any core and its source of voltage as seen in the slide below. At the time I couldn't really see what the slide was showing, and even now I struggle. However, it's relevant so it should be here.
*www.overclock3d.net/gfx/articles/2008/09/30002743836l.jpg
> individual cores transistion to ~0 power state
> transparent to other cores, platform, software, and VR
essential for Dynamic Speed (turbo mode)??
In a little more detail, the power gates allow any number of cores on a CPU to be working at normal voltages, while any cores idling can have their power shut off completely which in turn reduces idle core power leakage dramatically. It was asked why this hasn't been seen on previous processors and Ronak said that there has never been a suitable and reliable material to use as a gate until now.
Turbo Mode
Turbo Mode is a feature that has been spoken about quite a lot recently, but there have been many mixed claims about just how it works in Nehalem. Whilst it made its debut with mobile Penryn, it never really got a chance to actually work. What it was designed to do was if for instance you had a dual-core mobile Penryn CPU running a single threaded application, leaving one core totally idle, and the chips TDP was lower than what it was designed for, then Turbo Mode would aim to increase the clock speed of the active core. The reason this didn't really work was due to a lot of applications (starting with Vista) bumping the single thread load around active cores leaving them unable to initialise Turbo Mode for any length of time.
In Nehalem, this feature has been refined to work a whole lot better, largely in part to the CPU. The idea is pretty straight forward in that if you have a quad-core CPU and only two of the cores are active, then as long as the CPU detects that the heat levels are ok and the power levels are under the TDP, the two idle cores can be shut down and the two remaining active cores will be overclocked.
Turbo Mode can also come in to effect even if all four cores are active, so long as the CPU detects heat and power levels are under their set limits. In this case all four cores would be given a boost as per the slide below (bottom right). All Nehalem processors will at least be able to go up a single clock step (133MHz) in Turbo mode, even if all cores are active. Just as long as the CPU detects that the TDP hasn’t been exceeded.
*www.overclock3d.net/gfx/articles/2008/09/30223005113l.jpg