The Virtualization Benchmarking Chaos

There are an incredible number of pitfalls in the world of server application benchmarking, and virtualization just makes the whole situation much worse. In this report, we want to measure how well the CPUs are coping with virtualization. That means we need to choose our applications carefully. If we use a benchmark that spends very little time in the hypervisor, we are mostly testing the integer processing power and not how the CPU copes with virtualization overhead. As we have pointed out before, a benchmark like SPECjbb does not tell you much, as it spends less than one percent of its time in the hypervisor.

How is virtualization different? CPU A that beats CPU B in native situations can still be beaten by the latter in virtualized scenarios. There are various reasons why CPU A can still lose, for example CPU A…

  1. Takes much more time for switching from the VM to hypervisor and vice versa.
  2. Does not support hardware assisted paging: memory management will cause a lot more hypervisor interventions.
  3. Has smaller TLBs; Hardware Assisted Paging (EPT, NPT/RVI) places much more pressure on the TLBs.
  4. Has less bandwidth; an application that needs only 20% of the maximum bandwidth will be bottlenecked if you run six VMs of the same application.
  5. Has smaller caches; the more VMs, the more pressure there will be on the caches.

To fully understand this, it helps a lot if you read our Hardware Virtualization: the nuts and bolts article. Indeed, some applications run with negligible performance impact inside a virtual machine while others are tangibly slower in a virtualized environment. To get a rough idea of whether or not your application belongs to the latter or former group, a relatively easy rule of thumb can be used: how much time does your application spend in user mode, and how much time does it need help from the kernel? The kernel performs three tasks for user applications:

 
  • System calls (File system, process creation, etc.)
  • Interrupts (Accessing the disks, NICs, etc.)
  • Memory management (i.e. allocating memory for buffers)

The more work your kernel has to perform for your application, the higher the chance that the hypervisor will need to work hard as well. If your application writes a small log after spending hours crunching numbers, it should be clear it's a typical (almost) "user mode only" application. The prime example of a "kernel intensive" application is an intensively used transactional database server that gets lots of requests from the network (interrupts, system calls), has to access the disks often (interrupts, system calls), and has buffers that grow over time (memory management).

However, a "user mode only" application can still lose a lot of performance in a virtualized environment in some situations:

  • Oversubscribing: you assign more CPUs to the virtual machines than physically available. (This is a very normal and common way to get more out of your virtualized server.)
  • Cache Contention: your application demands a lot of cache and the other virtualized applications do as well.

These kinds of performance losses are relatively easy to minimize. You could buy CPUs with larger caches, and assign (set affinity) certain cache/CPU hungry applications some of the physical cores. The other less intensive applications would share the CPU cores. In this article, we will focus on the more sensitive workloads out there that do quite a bit of I/O (and thus interrupts), need large memory buffers, and thus talk to the kernel a lot. This way we can really test the virtualization capabilities of the servers.

Index Independent Real-World Virtualization Benchmarking
Comments Locked

66 Comments

View All Comments

  • Bandoleer - Thursday, May 21, 2009 - link

    I have been running Vmware Virtual Infrastructure for 2 years now. While this article can be useful for someone looking for hardware upgrades or scaling of a virtual system, CPU and memory are hardly the bottlenecks in the real world. I'm sure there are some organizations that want to run 100+ vm's on "one" physical machine with 2 physical processors, but what are they really running????

    The fact is, if you want VM flexability, you need central storage of all your VMDK's that are accessible by all hosts. There is where you find your bottlenecks, in the storage arena. FC or iSCSI, where are those benchmarks? Where's the TOE vs QLogic HBA? Considering 2 years ago, there was no QLogic HBA for blade servers, nor does Vmware support TOE.

    However, it does appear i'll be able to do my own baseline/benching once vSphere ie VI4 materializes to see if its even worth sticking with vmware or making the move to HyperV which already supports Jumbo, TOE iSCSI with 600% increased iSCSI performance on the exact same hardware.
    But it would really be nice to see central storage benchmarks, considering that is the single most expensive investment of a virtual system.

  • duploxxx - Friday, May 22, 2009 - link

    perhaps before you would even consider to move from Vmware to HyperV check first in reality what huge functionality you will loose in stead of some small gains in HyperV.

    ESX 3.5 does support Jumbo, iscsi offload adapters and no idea how you are going to gain 600% if iscsi is only about 15% slower then FC if you have decent network and dedicated iscsi box?????
  • Bandoleer - Friday, May 22, 2009 - link

    "perhaps before you would even consider to move from Vmware to HyperV check first in reality what huge functionality you will loose in stead of some small gains in HyperV. "

    what you are calling functionality here are the same features that will not work in ESX4.0 in order to gain direct hardware access for performance.
  • Bandoleer - Friday, May 22, 2009 - link

    The reality is I lost around 500MBps storage throughput when I moved from Direct Attached Storage. Not because of our new central storage, but because of the limitations of the driver-less Linux iSCSI capability or the lack there of. Yes!! in ESX 3.5 vmware added Jumbo frame support as well as flow control support for iSCSI!! It was GREAT, except for the part that you can't run JUMBO frames + flow control, you have to pick one, flow control or JUMBO.

    I said 2 years ago there was no such thing as iSCSI HBA's for blade servers. And that ESX does not support the TOE feature of Multifunction adapters (because that "functionality" requires a driver).

    Functionality you lose by moving to hyperV? In my case, i call them useless features, which are second to performance and functionality.



  • JohanAnandtech - Friday, May 22, 2009 - link

    I fully agree that in many cases the bottleneck is your shared storage. However, the article's title indicated "Server CPU", so it was clear from the start that this article would discuss CPU performance.

    "move to HyperV which already supports Jumbo, TOE iSCSI with 600% increased iSCSI performance on the exact same hardware. "

    Can you back that up with a link to somewhere? Because the 600% sounds like an MS Advertisement :-).

  • Bandoleer - Friday, May 22, 2009 - link

    My statement is based on my own experience and findings. I can send you my benchmark comparisons if you wish.

    I wasn't ranting at the article, its great for what it is, which is what the title represents. I was responding to this part of the article that accidentally came out as a rant because i'm so passionate about virtualization.

    "What about ESX 4.0? What about the hypervisors of Xen/Citrix and Microsoft? What will happen once we test with 8 or 12 VMs? The tests are running while I am writing this. We'll be back with more. Until then, we look forward to reading your constructive criticism and feedback.

    Sorry, i meant to be more constructive haha...



  • JohanAnandtech - Sunday, May 24, 2009 - link

    "My statement is based on my own experience and findings. I can send you my benchmark comparisons if you wish. "

    Yes, please do. Very interested in to reading what you found.

    "I wasn't ranting at the article, its great for what it is, which is what the title represents. "

    Thx. no problem...Just understand that these things takes time and cooperation of the large vendors. And getting the right $5000 storage hardware in lab is much harder than getting a $250 videocard. About 20 times harder :-).


  • Bandoleer - Sunday, May 24, 2009 - link

    I haven't looked recently, but high performance tiered storage was anywhere from $40k - $80k each, just for the iSCSI versions, the FC versions are clearly absurd.

  • solori - Monday, May 25, 2009 - link

    Look at ZFS-based storage solutions. ZFS enables hybrid storage pools and an elegant use of SSDs with commodity hardware. You can get it from Sun, Nexenta or by rolling-your-own with OpenSolaris:

    http://solori.wordpress.com/2009/05/06/add-ssd-to-...">http://solori.wordpress.com/2009/05/06/add-ssd-to-...
  • pmonti80 - Friday, May 22, 2009 - link

    Still it would be interesting to see those central storage benchmarks or at least knowing if you will/won't be doing them for whatever reason.

Log in

Don't have an account? Sign up now