Running OpenStack in Production: Part 1: Hardware

Everett Toews, Senior Developer

For the inaugural Tech Radar post I’d like to dive right in and discuss running OpenStack in a production environment. Cybera chose OpenStack as our primary Infrastructure as a Service software stack and we’ve learned a lot about it over the past six months. Most recently, we’ve built a cloud using OpenStack for CANARIE’s DAIR project. This post is just the first in a series describing our experience putting OpenStack into production.

To start with, you’ll need hardware. If you have the time and inclination, the best thing to do might be to ask Rackspace Cloud Builders for some help spec’ing out the hardware for OpenStack. This is the route that Cybera went and we got some badly needed advice. Since you might not be able to go that route, I’ll tell you what we know.

At the end of the day we went with Dell, based on the Cloud Builders’ advice and our own due diligence. If you aren’t aware of it yet, Dell is supporting OpenStack in a big way. They have a number of pages dedicated to it here. There’s also a whitepaper that discusses hardware and network for OpenStack, if you feel like filling out the form.

We ordered four different types of servers (aka nodes). A management node (nova-api, nova-network, nova-scheduler, nova-objectstore), compute nodes (nova-compute, nova-volume), a proxy node (swift-proxy-server) and storage nodes (swift-object-*, swift-container-*, swift-account-*). All nodes were contained in the Dell C6100 chassis. Here are the specs:

  Processor Sockets Cores Threads RAM Disk
Management E5620 2 8 16 24 8 x 300 GB
Compute X5650 2 12 24 96 6 x 500 GB
Proxy E5620 2 8 16 24 4 x 300 GB
Storage E5620 2 8 16 24 6 x 2 TB

 

The disk on the compute nodes is used for VMs and volumes, which is to say:

  • a portion can be used for VM instances, the files that back the VMs
  • a portion can be used for volumes, the files that back the virtual hard disks for the VMs (technically speaking it’s logical volumes that back the virtual hard disks but you can think of them as files). See Managing Volumes.

One caveat is that if you’re using the compute nodes for your VM instances (as opposed to some big storage array mounted via NFS) you won’t be able to do live migrations, see Configuring Live Migrations. However, live migrations off of compute nodes may be supported in the future (who knows when) and you could always switch to a big storage array later (with some effort) if live migrations is an essential requirement for you right now. We use the compute nodes for VM instances so no live migrations for us but that’s okay because DAIR users will be transient so we’ll have the opportunity to take machines down in the brief periods between users.

Okay…storage. First thing to do is check the System Requirements page. They actually recommend some specs there. We went with roughly half that spec. The most important thing to note is that,

RAID on the storage drives is not required and not recommended. Swift’s disk usage pattern is the worst case possible for RAID, and performance degrades very quickly using RAID 5 or 6.

No RAID! Now have a look at the Example Installation Architecture

It’s also worthwhile to note that Object Storage is highly available and highly redundant. This redundancy comes at a cost. For example, we have roughly 60 TB of raw storage but only 20 TB of usable storage. It basically means we have three copies of everything for redundancy. Rackspace recommended we go with five copies of everything (100 TB raw for 20 TB usable) but this just wasn’t necessary for a pilot project like DAIR. You can read a bit more about this in the Replication section of Understanding How Object Storage Works.

For networking gear, we went with 10 GigE connections. This meant two Dell 8024F switches, one for connecting the management and compute nodes and one for connecting the proxy and storage nodes. For the management (IPMI) network switch we chose the Dell 6248.

Is this the best mix of hardware possible for OpenStack? As always the answer is, “It depends.” It depends primarily on your the use cases for your cloud. We think we got a good mix of hardware but time will truely tell if it was the best mix possible for DAIR.

Hope this helps and wasn’t too overwhelming 🙂

 

Please enable JavaScript to view the comments powered by Disqus.
comments powered by Disqus