CHAPTER 2: Why single core CPUs are no longer "cool"

The end of the single core CPU?

Right now, the leakage problem is by far the most urgent problem. As profit margins are low and cost is, in most cases, the decisive factor for consumers, expensive cooling systems are not practical.

Past experience has shown that complex superscalar CPUs need about twice as many transistors to achieve +/- 40% better performance. The conclusion of many industry analysts and researchers is that the single-core CPU has no future. I quote Shekhar Borkar, Intel Fellow, Director:

"Multiprocessing, on the other hand, has potential to provide near linear performance improvement. Two smaller processors, instead of a large monolithic processor, can potentially provide 70-80% more performance, compare this to only 40% from the large monolithic processor."

Note the word "monolithic", a word with a rather pejorative meaning, which insinuates that the current single core CPUs are based on old technology. So, basically the single core CPU has no future as it improves performance only by 40%, while doubling complexity and thus leakage. This reasoning explains why all of sudden Intel marketing does not talk anymore about 10 GHz CPUs, but about the "era of thread parallelism".

It should be noticed though that the 40% better performance of the "monolithic CPU" is achieved across a wide variety of applications, without the need of time-consuming software optimizations. The promised 70% to 80% of the multithreaded CPU can only be easily achieved in a small range of applications, while the other applications will see exponential investments in development time to achieve the same performance increase.

 

Of course, we agree that multiprocessors have benefits. It is easier to turn off a complete CPU than to manage the energy consumption of the different parts of one big CPU.

And you can run a single-threaded application on CPU1, and turn the CPU2 off. When CPU1 gets almost hot, you let CPU2 continue to do the work. As a result, you reduce the average temperature of one CPU core. As leakage decreases with lower die temperatures, this technique can reduce overall leakage power. The objective of using a dual core CPU is then primarily to reduce power consumption in situations where there is only one CPU intensive application. This is probably the reason why Intel sees a great future for dual core CPUs in the mobile market, although the mobile market is probably the last market where we will be able to benefit from dual core power. The last thing that you want is twice as much power dissipation because the two cores get active. In our humble opinion, dual core will be only dual when it is not working on battery power.

Trendy

The second argument used by the people who are hyping the multithreaded CPU is "the whole industry is moving towards multi-core CPUs". Considering that the server is the only market where non x86 CPUs play an important role, it is not very surprising. For companies such as SUN and IBM, it is only natural to ignore single-threaded performance somewhat and to invest as much time as they can in designs that can work with as many threads as possible. The software that runs on these SUN and IBM machines, Massive OLTP databases and HPC applications, are multi-threaded by nature.

SUN's Niagra CPU can run 32 threads at once, but it will not be the kind of CPU that you would like in your desktop. Single threaded performance is most likely at the level of one of the early PIIIs. Sun's own demo [6] shows a Niagra to be more than 4 times slower in a single-threaded application than an unknown single-threaded CPU, which is, hopefully for SUN, one of the current top CPUs.

Delving deeper

So, while there are definite advantages to CPUs that exploit Thread Level Parallelism, if we want to understand what is really going on, we need to delve a little deeper. First, we look if leakage can really kill all progress of "monolithic" single core CPUs; secondly, we will study the prime example of a "classic" single core CPU that had crushed into a wall of leakage: the Intel Prescott.


CHAPTER 1 (con't) CHAPTER 3: Containing the epidemic problems
Comments Locked

65 Comments

View All Comments

  • Zak - Wednesday, August 22, 2007 - link

    I seem to remember reading somewhere, probably couple of years ago, about research being done on hyperconductivity in "normal" temperatures. Right now hyperconductivity occurs only in extremely low temperatures, right? If materials were developed that achieve the same in normal temperatures it'd solve lots of these issues, like wire delay and power loss, wouldn't it?

    Z.
  • Tellme - Monday, February 21, 2005 - link

    Carl what i meant was that soon we might not see much improved performance with multicores as well because the data comes too late to the processor for quick execution. (That is true for single cores as well).

    Did you checked the link?
    Their idea is simple.
    "If you can't bring the memory bandwidth to the processor, then bring the processors to the memory."
    Intresting no?
    Currently processor waits most of its time for data to be processed.

  • carl0ski - Saturday, February 19, 2005 - link

    #61 i thought p4 already had memory bandwidth problems,
    AMD has a temporary work around (on die memory controller) which aids in multiple CPU's/Dies using the same fsb to access the Ram.

    Intel has proposed multiple fsb's , one each CPU/die.

    Does anyone know if that means they will need sperate RAM dimms for each FSB? because that would prove an expensive system.
  • carl0ski - Saturday, February 19, 2005 - link

    [quote]59 - Posted on Feb 12, 2005 at 11:28 AM by fitten Reply
    #57 What was the performance comparison of the 1GHz Athlon vs. the 1GHz P3? IIRC, the Athlon was faster by some margin. If this was the case, then there was a little more than tweaking that went on in the Pentium-M line. Because they started out looking at the P3 doesn't mean that what they ended up with was the P3 with a tweak here or there. :)[/quote]

    #59 didnt P3 1ghz run 133mhz sdram? on a 133fsb?
    Athlon 1ghz had a nice DDR 266 fsb to support it.

  • Tellme - Monday, February 14, 2005 - link

    Nice article.

    I think dual cores will soon reach hit the wall ie Memory Bandwidth.

    Hopefully memory and processors are integrates in near future.

    See
    http://www.ee.ualberta.ca/~elliott/cram/

  • ceefka - Monday, February 14, 2005 - link

    Though still a little too technical for me, it makes a good read.

    It's good to know that Intel has eaten their words and realized they had to go back to the drawing board.

    I believe rather sooner than later multicore will mean 4 - 8 cores providing the power to emulate everything that is not necessarily native, like running MAC OSX on an AMD or Intel box. Iow the CELL will meet its match.
  • fitten - Saturday, February 12, 2005 - link

    #57 What was the performance comparison of the 1GHz Athlon vs. the 1GHz P3? IIRC, the Athlon was faster by some margin. If this was the case, then there was a little more than tweaking that went on in the Pentium-M line. Because they started out looking at the P3 doesn't mean that what they ended up with was the P3 with a tweak here or there. :)
  • avijay - Friday, February 11, 2005 - link

    EXCELLENT Article! One of the very best I've ever read. Nice to see all the references at the end as well. Could someone please point me to Johan's first article at AT please. Thanks.
    Great Work!
  • fishbreath - Friday, February 11, 2005 - link

    For those of you who don't actually know this:

    1) The Dotham IS a Pentium 3. It was tweaked by Intel in Israel, but it's heart and soul is just a PIII.

    1b) All P4's have hyperthreading in them, and always have had. It was a fuse feature that was not announced until there were applications to support them. But anyone who has HT and Windows XP knows that Windows simply has a smoother 'feel' when running on an HT processor!

    2) Complex array processors are already in the pipeline (no pun intended). However the lack of an operating system or language to support them demands they make their first appearance in dedicated applications such as h264 encoders.
  • blckgrffn - Friday, February 11, 2005 - link

    Yay for Very Large Scale Integration (more than 10,000 transistors per chip)! :) I wonder when the historians will put down in the history books that we have hit the fifth generation of computing org....

Log in

Don't have an account? Sign up now