Blog entries

Nicira & Citrix are Warming Up

Some exciting news on the open cloud front.  Nicira’s openvswitch (think: open source Cisco Nexus 1000V) made it in as the default vSwitch in the latest release of the Xen Cloud Platform.  For those who aren’t aware, the Xen Cloud Platform is an open source provider/cloud-focused management framework for clouds.  The website says:


    Xen Cloud Platform offers ISVs and service providers a complete cloud
    infrastructure platform with a powerful management stack based on
    open, standards-based APIs, support for mutli-tenancy, SLA guarantees
    and deteailed metrics for consumption based charging.

I’ve mentioned Nicira before in public forums and videos made with John Willis, but I haven’t posted here about them.  Nicira is commercializing the OpenFlow switch specification.  OpenFlow is a very important change in the way we build, design, and manage network infrastructure.

From the website:


    In a classical router or switch, the fast packet forwarding (data path)
    and the high level routing decisions (control path) occur on the same
    device. An OpenFlow Switch separates these two functions. The data
    path portion still resides on the switch, while high-level routing decisions
    are moved to a separate controller, typically a standard server. The
    OpenFlow Switch and Controller communicate via the OpenFlow protocol,
    which defines messages, such as packet-received, send-packet-out,
    modify-forwarding-table, and get-stats.

What this means is that instead of allowing the switch to make routing/switching decisions, you can have centralizing control of the entire network topology.  OpenFlow has two types of switches: software switches like the openvswitch and firmware that can be loaded onto cheap switch hardware.  Combined you can create fully virtualized networking.  A single centralized control system that is integrated to your cloud layout can reprogram your logical network topology on-demand.  A virtual server moves from one host to another?  Switches are reprogrammed dynamically and the move is never noticed.

This means you can create a fully multi-tenant, highly secure, extremely flexible, cloud network topology that maps exactly to your requirements.  This contrasts starkly to the current cloud networking today, which is either extremely restrictive (Amazon’s EC2), has scaling problems (e.g. 802.1q VLAN tagging), or doesn’t give you complete control (Rackspace Cloud, et al).

Let me clarify what I mean by complete control before anyone is offended. Rackspace Cloud does provide more control than EC2, but it doesn’t put you in the driver’s seat.  Imagine that instead of having a fixed network architecture like, every customer has a ‘frontend public network’ and a ‘backend private network’, you have something that allows arbitrary network configurations?  Customers get a ‘private’ network by default and buy networks as their applications need them.  Now having a separate network for database servers per PCI compliance (or other) rules is trivial.

Many other things are possible if you move towards an OpenFlow-based network architecture with a centralized control system, including:

  • Distributed firewall just like Amazon EC2’s distributed firewall
  • On-demand network introspection / tapping
  • On-demand in-line firewall / IPS
  • N-tier network topologies
  • Distributed Virtual Switch (a la Cisco Nexus 1000V)

There are many other possibilities.  The eventual promise here is network virtualization as good as storage or computing virtualization is today.

Way to go Nicira and Citrix!

Post to Twitter

Cloudscaling on a Tear – 2009 in Review

We’re a little late in posting this due to the holidays, but I have some exciting stats to share with you.  In 2009 the Cloudscaling blog became one of the hottest destinations for cloud knowhow.  A big part of that success was our unique perspective on cloud computing.  We aren’t a news aggregation site.  Instead we try and provide hard information on differentiated visions on what cloud is, how it can help, and what people are doing with it today.

In particular, a number of articles posted here last year were extremely widely read.  In fact, the #1 article had well over 10,000 pageviews and almost 9,000 unique visitors.  3,500 pageviews came in that first week of posting 09/27/09 – 10/03/09.  That’s an average of 500 per day.

Here’s a chart showing our blog traffic growth over 2009:

2009-cloudscaling-blog-stats

As you can see we had tremendous growth and we’re expecting more in 2010.  Thanks for your readership and especially your comments.  We’re looking forward to even more conversation this year.

Here’s a list of our top ten blog posts in 2009 (in order of most read) if you want to go back and review.

  1. Amazon’s EC2 Generating 220M+ Annually
  2. VMware vs. Amazon … ROUND ONE … FIGHT!
  3. Why is Amazon’s SAS70 Audit Bogus?
  4. EngineYard uses Chef, a Puppet Alternative
  5. The “Open” Cloud is Coming
  6. VMware’s vCloud API Forces Cloud Standards
  7. Amazon Threatens VPS Market
  8. On Second Thought…How Big Is AWS Really?
  9. Infrastructure-as-a-Service Builder’s Guide v1.0
  10. Defining Infrastructure Clouds

    It’s worth pointing out that the Infrastructure-as-a-Service Builder’s Guide made #9 in the list, but was posted on 12/19/09.  It made #9 in only 12 days time.  The actual white paper has been downloaded almost 1,000 times in less than one month.

    Again, thanks so much for readership.

    Best,

    –Randy Bias, CEO, Cloudscaling

    Post to Twitter

    How Clouds Enable Global Reach

    Over a year and a half ago, I mentioned that there were four key aspects to cloud computing: scalability, leverage, speed, and reach.  All of these still hold true today.  In particular, the one area that was underdeveloped was the notion of using clouds for global reach.

    As you know, since then quite a bit has changed.  Amazon’s Elastic Compute Cloud crossed the Atlantic to Europe, EC2 opened up a U.S. West Coast presence, AWS also recently pre-announced their Asian expansion, and a number of other clouds sprung up across the globe, including a very strong new Australian entrant, Cloud Central.[1]

    All of this goes to show that my prediction around the importance of reach in cloud computing is coming true.  One of the examples that brings this home that I enjoy talking about is Friendster.

    For those of you new to social networking, Friendster was one of the very first social networks.  They were a true first mover in the space, but due to some strategic and tactical errors, they quickly fell behind sites like MySpace, Facebook, and LinkedIn. Except in the AsiaPacific region!

    Friendster is one of the largest social networking sites still within that geographic region. You can see how they have re-tooled their business to be friendly to the AsiaPac region by providing localization in many Asian languages.

    Now here’s the kicker: Friendster’s initial infrastructure was all in the United States.  What happens when your market changes underneath you?  How do you respond?  What tools are there to adapt?

    As cloud computing goes global, it’s very nature provides a whole new opportunity in how businesses think about responding to market shifts.  Now you can follow-the-sun, follow-the-moon, follow-the-law, and up and move your entire application to a new country with much less effort than ever before … and, it will get even easier over time.

    Cloud computing is going global and it’s going to change the way we think about service delivery models completely.


    [1] DISCLOSURE: Cloud Central is a Cloudscaling customer. They are currently in private BETA and looking for folks to provide feedback. Please take a look if you have a moment!

    Post to Twitter

    Infrastructure-as-a-Service Builder’s Guide v1.0

    Just in time for the New Year, we’re releasing a short 12 page whitepaper on building Infrastructure-as-a-Service (IaaS) clouds.  This whitepaper is targeted at folks building public or private clouds who want to understand our general take on clouds, cloud computing, and Infrastructure-as-a-Service.  In particular, we highlight some of the important areas to think about when you are planning and designing your infrastructure cloud.

    Of course, we welcome comments and feedback.  They will be incorporated into future revisions.  The paper itself does go into some technical depth in a few areas, but we can provide quite a bit more color in our workshops.

    For your reading pleasure, I present our first big technical whitepaper:

    Thanks!

    The Cloudscaling Team

    Ps. We realize the definition of ‘workload’ or ‘cloud workload’ is not as crisp as it could be and request your feedback and thinking on better nomenclature or definitions.  Credit will be given as appropriate.


    Post to Twitter

    Virtual Server vs. Real Server Disk Drive Speed

    It’s important to understand the potential differences between virtual server disk drives and physical disk drives, so I wanted to post a very brief blog on the topic.  For this article I’ve chosen to compare the performance of an iSCSI SAN on Gigabit Ethernet to a single SATA disk drive.  The reason for this is two-fold: first, it more starkly highlights the relative performance differences between purchasing say a single dedicated server in a hosting environment with a single disk or a virtual machine hosted in a cloud environment.  Secondly, when you are looking at internal private clouds or a lot of the newer cloud offerings, they are commonly built using an iSCSI SAN backend.

    To be clear, the top three U.S. clouds do not use iSCSI SANs: Amazon’s EC2, Rackspace Cloud, and GoGrid, all use local RAID subsystems.  This is common knowledge.  Of the early cloud pioneers, as far as I’m aware, mostly the U.K.-based clouds such as ElasticHosts and FlexiScale use iSCSI SANs.  The latest set of new cloud entrants, such as Savvis, Terremark, and Hosting.com all use either iSCSI or Fiber Channel-based SANs.  This is also commonly known.

    Your Mileage May Vary on these performance numbers.  I’m not trying to highlight any ‘right’ way to build a cloud here.  I’m simply trying to show what the difference in performance is between a single SATA disk and a VM disk drive backed by an iSCSI SAN over a single Gigabit Ethernet.

    This is not a robust performance and benchmarking analysis.  It’s a simple “run the numbers and compare” blog posting.  These are by no means authoritative performance numbers and that’s not their purpose either.  Their purpose is to highlight how performance differs between a single spindle and many in a RAID configuration, even when that RAID is available via a SAN over Gigabit Ethernet.

    Please avoid overly critiquing the testing technique here.  It’s not meant to be robust, so nitpicking it serves no purpose.

    Setup & Methodology
    This is a very simple test in the Cloudscaling hosting & cloud lab environment.  Both servers running the test are on latest Ubuntu Jaunty Jackalope release.  One is a physical server with a single SATA disk and the other is a VMware vSphere VM backed by an iSCSI LUN.  The iSCSI LUN is provided by a ZFS-based SAN product called NexentaStor from Nexenta Systems.  This is an OpenSolaris derivative and a very cost effective alternative to say a NetApp or EqualLogic system.

    The iSCSI SAN hardware is a simple Sun x2200 M2 with a Sun J4200 JBOD and 6 15K RPM SAS drives.

    The bonnie++ command line was as simple as possible:


    bonnie++ -n 512


    Note that the simplicity of the bonnie testing method may have caused some weird skewing of numbers.  See below for more.

    Basic Numbers
    Here is a basic high-level chart showing the numbers.

    Figure 1. High level of SATA vs. VM disk

    Figure 1. High level of SATA vs. VM disk

    The first thing you will notice, of course, is the two big spikes for sequential and random file reads.  These numbers are artificially inflated as clearly 325,000 IOPS for sequential and 460,000 IOPS for random reads are ridiculous.  This is likely due to caching either in the OS or the controller on the physical box.  bonnie++ is supposed to account for this, but for some reason, in this instance it did not.  So it might be a little easier to evaluate the relative performance on a logarithmic scale:

    Figure 2. Logarithmic Scale for High Level Results

    Figure 2. Logarithmic scale for test Results

    Much better.  What is easier to notice here is that the VM generally performs better on both standard measures of disk speed: raw throughput and disk operations (I/O per second or IOPS) with the obvious exception of the two aberrant data points.

    Removing those two data points will give us an even clearer picture:

    Figure 3. Normalized test results

    Figure 3. Normalized test results

    Great.  Now this is very clear.  As you can see, the first half of the chart shows raw throughput (Kbytes/second).  When reading blocks from the VM disk we’re nearly saturating the gigabit ethernet link which should top out at 125Mbps theoretical, and we’re hitting 107MBps on average over 10 runs, so this is quite acceptable.  The SATA disk, in comparison gets just over 60MBps, which is about right, even though the SATA spec and controller are capable of more.  Sustained block reads from SATA disks will typically be 60-80MBps in the real world.

    Much more interesting is the number of IOPS.  Many real world disk workloads, like a database spend the majority of their time doing large amounts of their ’seeking’ from one position of the disk to another, meaning lots of random file access.  They will bottleneck on waiting for the disk ‘head’ to move from one position to another on a disk drive and read new data.  It’s hard to tell the difference above because the SATA disk is so slow it barely registers on the chart.

    If we change to a logarithmic scale again the data becomes much easier to read:

    Figure 4. Normalized logarithmic scale test data

    Figure 4. Normalized logarithmic scale test data

    Now you can see that doing random seeks (i.e. moving the head of the disk drive from one location to a new one to read a piece of data) are starkly different.  A single SATA disk gets about 185 IOPS while a set of 6 SAS disks in the SAN is right around 10,000 IOPS.  This is a huge performance difference.  There are several reasons for this.  One, a typical SATA disk has an average latency of 8.5ms and a 15K SAS disk has only 3ms.  Also, with 6 disks in a RAID configuration, I have 6x more disk heads to read with.

    It’s still a bit hard to see with this chart, but for most of the rest of the IOPS tests above, the SAN solution is roughly 3x the performance of the single disk.  For example, Sequential File deletion is 2,573 (SAN) vs. 840 (SATA).

    Rather than going through the entire set of results, I recommend you download my simple spreadsheet.

    Note that for Amazon, Rackspace, or GoGrid, local VM disk results will likely look very similar to the iSCSI SAN results for IOPS and sequential read/write (first half of chart) will be much higher.

    Amazon’s Elastic Block Storage (EBS) would have similar performance characteristics to the iSCSI SAN above and hence you can see why it can be acceptable for running a database.

    Summary
    My point here is very simple.  I want to highlight the difference between purchasing a dedicated server with a single (or small number of) SATA disks vs. going with a cloud solution that uses a shared iSCSI SAN or local RAID on a single physical node.  Purchasing your  own dedicated server solution with a RAID can be extremely costly compared to a similar cloud solution.

    More importantly, for those workloads that require random I/O and file access, like database applications, RAID is clearly a winner.  That’s why using a shared RAID (via an iSCSI SAN or a local RAID) on a physical node for your cloud VM can be a clear advantage of the cloud today.

    Post to Twitter

    Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.