Virtual Server vs. Real Server Disk Drive Speed

It’s important to understand the potential differences between virtual server disk drives and physical disk drives, so I wanted to post a very brief blog on the topic.  For this article I’ve chosen to compare the performance of an iSCSI SAN on Gigabit Ethernet to a single SATA disk drive.  The reason for this is two-fold: first, it more starkly highlights the relative performance differences between purchasing say a single dedicated server in a hosting environment with a single disk or a virtual machine hosted in a cloud environment.  Secondly, when you are looking at internal private clouds or a lot of the newer cloud offerings, they are commonly built using an iSCSI SAN backend.

To be clear, the top three U.S. clouds do not use iSCSI SANs: Amazon’s EC2, Rackspace Cloud, and GoGrid, all use local RAID subsystems.  This is common knowledge.  Of the early cloud pioneers, as far as I’m aware, mostly the U.K.-based clouds such as ElasticHosts and FlexiScale use iSCSI SANs.  The latest set of new cloud entrants, such as Savvis, Terremark, and Hosting.com all use either iSCSI or Fiber Channel-based SANs.  This is also commonly known.

Your Mileage May Vary on these performance numbers.  I’m not trying to highlight any ‘right’ way to build a cloud here.  I’m simply trying to show what the difference in performance is between a single SATA disk and a VM disk drive backed by an iSCSI SAN over a single Gigabit Ethernet.

This is not a robust performance and benchmarking analysis.  It’s a simple “run the numbers and compare” blog posting.  These are by no means authoritative performance numbers and that’s not their purpose either.  Their purpose is to highlight how performance differs between a single spindle and many in a RAID configuration, even when that RAID is available via a SAN over Gigabit Ethernet.

Please avoid overly critiquing the testing technique here.  It’s not meant to be robust, so nitpicking it serves no purpose.

Setup & Methodology
This is a very simple test in the Cloudscaling hosting & cloud lab environment.  Both servers running the test are on latest Ubuntu Jaunty Jackalope release.  One is a physical server with a single SATA disk and the other is a VMware vSphere VM backed by an iSCSI LUN.  The iSCSI LUN is provided by a ZFS-based SAN product called NexentaStor from Nexenta Systems.  This is an OpenSolaris derivative and a very cost effective alternative to say a NetApp or EqualLogic system.

The iSCSI SAN hardware is a simple Sun x2200 M2 with a Sun J4200 JBOD and 6 15K RPM SAS drives.

The bonnie++ command line was as simple as possible:


bonnie++ -n 512


Note that the simplicity of the bonnie testing method may have caused some weird skewing of numbers.  See below for more.

Basic Numbers
Here is a basic high-level chart showing the numbers.

Figure 1. High level of SATA vs. VM disk

Figure 1. High level of SATA vs. VM disk

The first thing you will notice, of course, is the two big spikes for sequential and random file reads.  These numbers are artificially inflated as clearly 325,000 IOPS for sequential and 460,000 IOPS for random reads are ridiculous.  This is likely due to caching either in the OS or the controller on the physical box.  bonnie++ is supposed to account for this, but for some reason, in this instance it did not.  So it might be a little easier to evaluate the relative performance on a logarithmic scale:

Figure 2. Logarithmic Scale for High Level Results

Figure 2. Logarithmic scale for test Results

Much better.  What is easier to notice here is that the VM generally performs better on both standard measures of disk speed: raw throughput and disk operations (I/O per second or IOPS) with the obvious exception of the two aberrant data points.

Removing those two data points will give us an even clearer picture:

Figure 3. Normalized test results

Figure 3. Normalized test results

Great.  Now this is very clear.  As you can see, the first half of the chart shows raw throughput (Kbytes/second).  When reading blocks from the VM disk we’re nearly saturating the gigabit ethernet link which should top out at 125Mbps theoretical, and we’re hitting 107MBps on average over 10 runs, so this is quite acceptable.  The SATA disk, in comparison gets just over 60MBps, which is about right, even though the SATA spec and controller are capable of more.  Sustained block reads from SATA disks will typically be 60-80MBps in the real world.

Much more interesting is the number of IOPS.  Many real world disk workloads, like a database spend the majority of their time doing large amounts of their ’seeking’ from one position of the disk to another, meaning lots of random file access.  They will bottleneck on waiting for the disk ‘head’ to move from one position to another on a disk drive and read new data.  It’s hard to tell the difference above because the SATA disk is so slow it barely registers on the chart.

If we change to a logarithmic scale again the data becomes much easier to read:

Figure 4. Normalized logarithmic scale test data

Figure 4. Normalized logarithmic scale test data

Now you can see that doing random seeks (i.e. moving the head of the disk drive from one location to a new one to read a piece of data) are starkly different.  A single SATA disk gets about 185 IOPS while a set of 6 SAS disks in the SAN is right around 10,000 IOPS.  This is a huge performance difference.  There are several reasons for this.  One, a typical SATA disk has an average latency of 8.5ms and a 15K SAS disk has only 3ms.  Also, with 6 disks in a RAID configuration, I have 6x more disk heads to read with.

It’s still a bit hard to see with this chart, but for most of the rest of the IOPS tests above, the SAN solution is roughly 3x the performance of the single disk.  For example, Sequential File deletion is 2,573 (SAN) vs. 840 (SATA).

Rather than going through the entire set of results, I recommend you download my simple spreadsheet.

Note that for Amazon, Rackspace, or GoGrid, local VM disk results will likely look very similar to the iSCSI SAN results for IOPS and sequential read/write (first half of chart) will be much higher.

Amazon’s Elastic Block Storage (EBS) would have similar performance characteristics to the iSCSI SAN above and hence you can see why it can be acceptable for running a database.

Summary
My point here is very simple.  I want to highlight the difference between purchasing a dedicated server with a single (or small number of) SATA disks vs. going with a cloud solution that uses a shared iSCSI SAN or local RAID on a single physical node.  Purchasing your  own dedicated server solution with a RAID can be extremely costly compared to a similar cloud solution.

More importantly, for those workloads that require random I/O and file access, like database applications, RAID is clearly a winner.  That’s why using a shared RAID (via an iSCSI SAN or a local RAID) on a physical node for your cloud VM can be a clear advantage of the cloud today.

Post to Twitter

  • I can confirm from our internal experience here at Cloud Central that you're numbers are right on the money. Our storage system employs iSCSI over multiple gig-e links backed by ZFS running on Sun storage hardware. We employ SSD's for read cache (L2ARC) and write logs (ZIL) as part of our hybrid storage pool, this allows us to leverage the benefits of SSD's (high IOPS / $) and hard disks (high capacity / %), whilst minimizing energy usage. The IOPS numbers we have seen from our storage sytem are very impressive (several thousand), whilst sequential reads & writes are in the order of 70-80MB/second, which is more than enough for 99% of DB and web workloads.

    The thing that people should keep in mind is that most workloads are dominated by random IO patterns, whilst very few worksloads require very high sequential IO performance (with the exception of workloads such as video editing, which I'd suggest are best done in house for the time being). Therefore generally speaking, the IOPS of your storage system is more important than the raw sequential read / write performance for real workloads.

    One additional interesting thing worth noting is the performance of iSCSI over gig-e vs Fiber Channel. Most enterprise users would turn their nose up at iSCSI, but is the performance gain offered by Fiber Channel really worth the additional cost? In my experience, the answer is no.

    Regards,
    Kris
  • I'd just add that the same misunderstandings exist with respect to network I/O as well as storage. In my tests, some of the providers with the best storage I/O had the worst network I/O and vice versa. Also, these things tend to vary a great deal according to instance type, and providers are generally not forthcoming about how I/O resources are apportioned. For example, I found that one major provider was applying fairly draconian network throttles to the smaller instance types, but good luck finding anything about that in their public documentation. On the storage side the problem is the opposite; there's nothing in the storage stack that's anything like the traffic-shaping functionality in the network stack, so if your "neighbor" decides to bang the hell out of the local storage you *will* be affected. SLAs mean nothing if the technical capabilities necessary to enforce them are absent.
  • You're right, it's true about network I/O, but in my experience very few apps are network bound, whereas most web apps (at least the DB portion) having a scaling constraint on the disk. For example, some of the largest folks on the Internet push 40Gbps out of a single datacenter. That's a lot of bandwidth, but that's across thousands of servers, so the average network utilization inside a DC is low per server. The bottlenecks tend to be the network on storage systems, not the individual servers themselves.

    There is some traffic shaping functionality and QoS in some of the SAN providers, but it's minimal at best. 3par and Compellant claim some, but as far as I can tell, only Pillar probably has something that works.

    Regardless, none of them were designed to do this 'right'. We need VM backing stores designed for these kinds of environments with proper QoS and the ability to tune by the frontend VM. For bonus points, the QoS, traffic-shaping, etc. would follow a VM when it 'migrates' as well.

    Anyway, that's a very long discussion. I think network I/O is important, but I haven't seen it be as impactful as disk except in those environments where the oversubscription rates are ludicrous, like container-based VPS offerings (e.g. OpenVZ/Virtuozzo on a box at 100:1 or more).
  • Is it local vs. remote (what you just said), or physical vs. virtual (the title)? I'd suggest that, whichever it is, the other should be held constant. It might also be more interesting to show results for like numbers of spindles in each case, since it's not quite a surprise that six spindles will outperform one except in very special circumstances. I realize that you didn't want to get too bogged down in methodological details, but statements like "sustained block reads from SATA disks will typically be 60-80MBps in the real world" lose their relevance if the methodological divergence is too great.

    Similarly, instead of saying that results on AWS etc. will "likely" look similar, why not actually test there? I've actually done that, it cost mere pocket change, and it revealed very significant differences not only between providers but among the same provider's options as well. I think it would be very interesting if we could collaborate on more fully characterizing the differences between the actual options that users have in this area.
  • The primary reason I haven't published any results using AWS is that the EULA can be read in such a way that posting said results might be in violation. So I'm playing it safe there.

    I (we) are happy to collaborate on this. I think it's generally mis-understood and underrepresented in the blogosphere. People don't really understand the differences in the storage architectures for the different clouds, yet that is a major area of concern.

    I knew one startup that had designed their architecture heavily around a message bus. I did the math for them and they needed about 10,000 IOPS on a single message hub to keep up with their volume. They did not seem to realize that might be a challenge.
  • Please please *please* don't use Mbps for mega*bytes* per second. If you're talking about network speeds, use megabits with a lowercase b; if you're talking about storage use megabytes with a capital B. I've seen people led astray by exactly this error enough for ten careers already, and it tends to mark the person making it as a novice in one area or the other.

    Other points...

    Using NexentaStor, or any software-based target, won't really give you a true feel for the throughput capability of an initiator, and neither will 512-byte writes or comparing one disk vs. several. Bonnie(++) is universally recognized as a poor and outdated benchmark, including by your friends at Sun. The issue with caching is that guest VMs can't generally reach out and force the host to flush its cache at the "right" points. This is getting better, but in the meantime some providers have resorted to configuring their hypervisors so that *all* guest I/O is done synchronously. That also leads to invalid results/comparisons, apparently including some of those above.
  • Thanks for your input. I've updated the article to be clear that this is not a robust benchmarking test and to fix my typos on MBps.

    If you would like to provide a better methodology, I will be happy to revisit this in the New Year based on your methodology.

    In the meantime, the primary purpose of the article is served, which is to show folks that there is a significant difference in the characteristics of performance between a single local disk and remote RAID even over GigE.
blog comments powered by Disqus

Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.