IOPS in a windows boot storm

By Curtis Collicutt, Cloud Developer, Edmonton

What we are going to do

We are going to boot 30 Windows 7 virtual machines '€” each with one virtual CPU (vCPU) and two gigabytes of memory '€” in two '€œboot storm'€ tests: one where we boot the virtual machines (VMs), 240 seconds apart, and the second where we boot 20 seconds apart, and watch the IOPS use while that is happening.

What we have

  • A single Dell C6220 node
  • Two 512GB Intel 520 solid state drives in a mdadm stripe
  • 128 GB main memory
  • 32 threads
  • Ubuntu 12.04 and kernel-based virtual machine (KVM) virtualization
  • Instances are backed by qcow2 snapshots stored on the solid-state drive (SSD) stripe
  • Generic Windows 7 image

IOPS

IOPS is the input/ouput operations per second.

It'€™s important to note that the set of striped SSDs has thousands of IOPS available, versus the mere hundreds (depending on what is cacheable by the RAID card) available through, for example, a RAID10 set of SATA drives. I use this example because that is how the servers I am using to test the SSDs were previously configured. The six available drive slots were filled with one-terabyte SATA drives in a RAID10 mdadm-based (ie. software RAID) configuration.

As far as collecting IOPS data, I'€™m using the IOPS data with iostat 5 and watching the particular striped drive set. The '€œ5'€ means the IOPS are averaged over that interval.

Starting instances

I have a script that starts instances using plain KVM. First, we'€™ll try booting them four minutes apart, then 20 seconds apart. Note that the instances just boot '€” that is all they are doing, and they do so with no network access. There is no stress testing or anything like that.

'€œSlow'€ boot storm

Slow boot storm… that phrase doesn'€™t make a lot of sense, does it? But I wanted to start with a baseline to compare to, so in the first example, we'€™ll boot instances four minutes apart and take a look at the IOPS usage. Booting 30 instances will therefore take two hours.

After taking note of the results of that baseline, we'€™ll boot the same number of virtual machines at a faster rate.

Even in a slow boot storm we can use 5000 IOPS, which is a lot more than is available without a good SAN or solid-state drives.

As you can see, after a while the virtual machines begin to settle down and are do not use many IOPS. It'€™s important to note that the VMs were booting for the whole test, right up until near the end, where they level out to near zero.

'€œFast'€ boot storm

Same exact thing, but this time booting instances 20 seconds apart instead of four minutes apart.

In this test we go over 7000 IOPS. However, it'€™s important to note that all the VMs are up and running after about 120 five-second intervals, so by far the most IOPS are used well after the VMs are booted, i.e. from 120 to 1200. (Also note that this test happens over about half the time of the slow boot storm).

Conclusion

First, let me say that I have never taken a statistics class, so there certainly could be something wrong with my graphs and data, and obviously there are a lot of variables at play, including the usage of iostat. That said, it'€™s a fairly simple test '€” how many IOPS does iostat see in five-second average intervals as 30 Windows 7 virtual machines are booting both 20 seconds, and 240 seconds, apart.

I think it is safe to conclude that the actual booting of VMs isn'€™t the big user of IOPS here, it'€™s what they do a few minutes after they'€™ve booted that seems to take up a lot of the storage system'€™s resources, and that is probably something that can be configured.

In a future post I'€™ll look at Ubuntu images and see what differences there are in terms of their usage of IOPS. It also might be interesting to keep track of write vs read IOPS.