Five Things OpenStack Needs to Do … Now.

Posted on by Randy Bias

At the most recent OpenStack SV 2015 event, I was invited to speak and provide a viewpoint on OpenStack’s future.  The year before in 2014 I gave a talk entitled “Lie of the Benevolent Dictator” which spawned the Product Working Group for OpenStack.  A blog posting entitled The Future of OpenStack is Now, 2015 in January of this year summed that presentation up along with providing some more color.  I have spoken at length on the topic of OpenStack’s future and so it can be challenging to take that discussion to the next level.  So this year I tried something just a little bit different.

I tried to spend more time riffing and talking from the heart (than normal) and focused on delivering 5 simple slides in 20 minutes.  That’s 4-5 minutes per slide instead of my normal 1 minute per slide.  In other words, it may be difficult to get all of the value from this deck that you normally would unless you watch the video itself.

In the video presentation, originally entitled “Three Things OpenStack Needs to Do … Now” except that I couldn’t boil it down to three and settled for five, the key slides are:

  1. Cloud is NOT “easy”
  2. COTS has significant cost
  3. OpenStack is NOT a “cheaper” VMware
  4. Differentiation == silos
  5. Infrastructure has zero value

Watching the video is the only way to get all of the value and there are some real gems in here if you spend the time.  I share some information I haven’t ever shared publicly before about Cloudscaling’s production deployments for example.

However, for those with attention challenges, here’s the synopsis for each slide:

  1. Marketing teams who try to position “private cloud” or OpenStack as “easy” are hurting everyone
  2. “Cheaper” hardware (e.g. from ODMs/CMs) and software (e.g. open source) has it’s own hidden costs
  3. OpenStack is designed for the Third Platform and using it for Second Platform workloads is a fool’s errand
  4. Adding secret sauce to OpenStack hurts everyone; not having a true base reference architecture for OpenStack hurts everyone
  5. Like electricity or cellular minutes, the bulk of value isn’t in the commoditized infrastructure layers, but the applications above that consume it

I hope you enjoy this as I enjoyed presenting it and was more on fire than usual.[1]

[1] I am working on eliminating a certain phrase from my repertoire and replacing it with a new phrase: “belly up.”  I humbly apologize for its use in the video.  No slur is meant, I simply didn’t use care when choosing my words.  This will be rectified.

Posted in OpenStack | Leave a comment

Cloud … You’re Doing it Wrong!

Posted on by Randy Bias

I’ve been doing “cloud” for about as long as it’s been a “thing”.  It is safe to say that I’ve talked about every conceivable topic related to cloud and cloud computing.  Unfortunately, I still run into a common problem, which is that the average enterprise looking to adopt cloud or build their own, is usually doing it for the wrong reasons and in the wrong way.  So I frequently find myself retreading old territory.  This blog posting is an attempt to synthesize down what you need to know about cloud as you start your journey.

Cloud Definition Value: Competitive Advantage

I’m half-tempted to define cloud, but honestly, that isn’t really necessary.  Regardless of what you think cloud is or isn’t, I’m seeing consistency in what people believe it will give them: agility, cost reduction, and future-proofing.  Put differently: move faster, cost less, and skate to the future.  Unfortunately, I still see the majority of businesses focused on only one of these aspects, reducing costs, which is arguably the least important.

What’s common across speed, cost, and future-proofing?  Gain a  competitive advantage.  Ask anyone today and it’s generally agreed that we are in a new, highly disruptive period, where business models, technology usage, and market position are all up for grabs. The vast majority of the Fortune 1000 have already turned over or are at risk according to the American Enterprise Institute [1], Forrester [2], and IDC [3]. The adage “adapt or die” is now widely applicable to even the most entrenched monopolies.  Even huge telcos, with massive barriers to entry protecting their marketplaces, like AT&T, are making aggressive moves into cloud.

All of this leads me to conclude that:

The primary value of cloud is to generate competitive advantage.

If you aren’t thinking about cloud in this manner then you are simply doing it wrong.  Period.

A Recent Example

What kind of company does this look like to you?


One might be tempted to say it looks like one of the webscale cloud pioneers like Google, Amazon, Microsoft, or Facebook.  It’s not.  It’s a very traditional business: Citi (Citigroup).

In January of 2015, Deloitte put out an interesting blog posting, entitled “software-defined everything.”[4]  It’s a bit overwrought as an article, but what I found interesting was the “My Take” by Greg Lavander of Citi.

Greg’s summary says it all:

Focusing IT on these three objectives—cloud scale, cloud speed, and cloud economics—has enabled Citi to meet our biggest challenge thus far: fostering organizational behavior and cultural changes that go along with advances in technology. We are confident that our software-defined data center infrastructure investments will continue to be a key market differentiator—for IT, our businesses, our employees, our institutional business clients, and our consumer banking customers.

Citi has embraced and adopted cloud in a very serious manner to help them create and maintain competitive advantage.  Choice pieces from this article:

Citi is not only embracing the most basic of cloud capabilities (VMs-on-demand), but also migrating towards more modern data storage and processing, scale-out-architectures, Platform-as-a-Service, DevOps, and application automation.

Most telling, they aren’t afraid to “re-architect” and “re-platform” existing applications, presumably moving them from the “second platform” to the “third platform”, a cloud-native approach.

Why do this unless there is a massive advantage to be gained?  Clearly Citi is all in.

You Can’t Cherry Pick Your Way to the Future

I know quite a few businesses who attempt to “adopt cloud” by building a VMs-on-demand system using VMware or OpenStack and then call it done.  The reality though is that you can’t cherry pick some of cloud and reap the benefits.  Quite the opposite.  You have to go “all-in on cloud” in order to create and maintain a competitive advantage.  To stay relevant.  To skate to the future.

Here are the warning signs if you are doing it wrong or trying to cherry pick a solution:

  • You are trying to “get a cheaper VMware”
  • Reducing your “datacenter costs” is most important to your executives
  • Are “doing private cloud first and public cloud later”
  • Have teams engaging in CYA and stalling tactics
  • You are focused on “more automation” for second platform applications
  • Think that “third platform” is a “tiny fraction” of your application needs

This is bass ackwards.  Success in cloud means making IT more relevant to the business by driving competitive advantage.  Driving competitive advantage means speed, lowered TCO, and looking towards the future.  It means not being afraid to “re-platform” key applications.  It means extracting maximum value out of data.  It means looking to the webscale cloud pioneers for patterns that might be relevant to your business (e.g. Hadoop for analytics and ETL).

Your Cloud Journey

If you are going to do it, do it right.  Deploying your Vblock and calling it a day isn’t enough.  Doing a couple of DevOps projects or deploying some NoSQL isn’t enough.  You’ve got to play to win and that means getting your executive team onboard, being an educator and instigator.  Explaining, re-explaining, and explaining again.  Adapt or die isn’t a platitude, it’s a watchphrase of our times.

Repeat after me: “cost savings is a side effect”.  Say that over and over until you have internalized it.

Then, get out there and get it done.

[1] “Fortune 500 firms in 1955 vs. 2014; 89% are gone” – American Enterprise Institute.
[2] “Seventy percent of the companies that were on the Fortune 1000 list a mere 10 years ago have now vanished – unable to adapt to change.” – Craig Le Clair, Forrester
[3] “Last year, we predicted that 3rd Platform solutions would disrupt one-third of the leaders in every industry by 2018.” – IDC
[4] Copyright © 2014 Deloitte Development LLC. All rights reserved.

Posted in Cloud Computing | Leave a comment

Killing the Storage Unicorn: Purpose-Built ScaleIO Spanks Multi-Purpose Ceph on Performance

Posted on by Randy Bias

Collectively it’s clear that we’ve all had it with the cost of storage, particularly the cost to maintain and operate storage systems.  The problem is that data requirements, both in terms of capacity and IOPS are exploding and growing exponentially, while the cost of storage operations and management is growing proportionally to those data needs.  Historically the biggest culprit is “storage sprawl” where we have pairs of arrays throughout the datacenter, each of which has to be managed individually.  Silo after silo, requiring specialized training, its own HA and resiliency, monitoring, and so on.  It’s for this reason that many turned to so-called “unified storage.”  This, unfortunately, is a terrible idea for larger deployments and those running production systems.

Let me explain.

The Storage Unicorn & a Rational Solution

We all want what we can’t have: a single globally distributed, unified storage system, that is infinitely scalable, easy to manage, replicated between datacenters and serves block devices, file systems, and object, all without hiccups.  Bonus points if you throw in tape as well!  This is the Storage Unicorn.  No such beast exists and never will.  Even before the EMC acquisition of Cloudscaling I was talking about these issues in my white paper: The Case for Tiered Storage in Private Clouds.

The nut of that white paper is that tier-1, mission critical storage, is optimized for IOPS, while tier-3 is optimized for long term durability.  Think of flash vs. tape.  These are not the same technologies nor can they serve the same purpose or use cases.

A multi-purpose tool is great in a pinch, but if you need to do real work, you need a purpose-built tool:

Ceph testing-13-RLB-edits.002

The issue remains though; how do we reduce storage management costs and manage the scaling of storage in a rational manner?  There is no doubt, for example, that the multi-purpose tool is lower overhead.  It’s simply less things to manage. That is a pro and a con.

The challenge then is to walk away from the Unicorn.  You can’t have a single storage system to solve all of your woes.  However, you probably don’t have to live with tens or hundreds of storage systems either.

Dialing down your storage needs to a small handful of silos would probably have a dramatic effect on operational costs!  What if you only had something like this in each datacenter:

  • Scalable distributed block storage
  • Scalable distributed object storage
  • Scalable distributed file system storage
  • Scalable control plane that manages all of the above

There isn’t any significant advantage in driving these four systems down to one.  The real optimization is from tens or hundreds of managed systems to a handful.  Everything else is unnecessary optimization at best or mental masturbation at worst.  Perhaps most egregiously, any gut check tells us that while 100 -> 4 managed systems is likely a 10x change in cost of management, 4 -> 1 is probably a small fraction of savings by comparison. [1]

Multi-Purpose Tool Example: Taking A Look at Ceph’s Underlying Architecture

The fundamental problem with any multi-purpose tool is that it makes compromises in each “purpose” it serves.  Multi-purpose tools do this because two tools frequently are designed for different purposes.  If you want your screwdriver to also be a hammer, you have to make some kind of tradeoff.  Ceph’s tradeoff, as a multi-purpose tool, is the use of a single “object storage” layer.  You have a block interface (RBD), an object interface (RADOSGW), and a filesystem interface (CephFS), all of which talk to an underlying object storage system (RADOS).  Here is the Ceph architecture from their documentation:

Glossed over is that RADOS itself is reliant on an underlying filesystem to store its objects.  So the diagram should actually look like this:

Ceph testing-13-RLB-edits.003

So in a given data path, for example a block written to disk, there is a high level of overhead:

Ceph testing-13-RLB-edits.005

In contrast, a purpose-built block storage system that does not compromise and is focused solely on block storage, like EMC ScaleIO, can be significantly more efficient:

Ceph testing-13-RLB-edits.006

This allows skipping two steps, but perhaps more importantly, it avoids complications and additional layers of indirection/abstraction as there is a 1:1 mapping of the ScaleIO client’s block and the block(s) on disk in the ScaleIO cluster.

By comparison, multi-purpose systems need to have a single unified way of laying out storage data, which can add significant overhead.  Ceph, for example, takes any of its “client data formats” (object, file, block), slices them up into “stripes”, and distributes those stripes across many “objects”, each of which is distributed within replicated sets, which are ultimately stored on a Linux filesystem in the Ceph cluster.  Here’s the diagram from the Ceph documentation describing this:

This is a great architecture if you are going to normalize multiple protocols, but it’s a terrible architecture if you are designing for high performance block storage only.

Show Me the Money!

We recently put our performance engineering team to the task of comparing performance between Ceph and ScaleIO.  Benchmarks are always difficult.  A lot depends on context, configuration, etc.  We put our best performance engineers on the job though, just so we could see if our belief that a purpose-built block storage system was better than a multi-purpose Swiss army knife.

  • Goal: evaluate block performance between Ceph and ScaleIO as fairly as possible
  • Assumptions: same hardware (servers, drives, network, etc.) and same logical configuration (as best as possible)
  • Test cases: “SSD only”, “SSD+HDD”, and “hybrid mode” with SSD as cache for HDD
  • Workload: 70% reads / 30% writes with 8KB blocks
  • Test tool: FIO with shallow (1) and deep (240) queue depths

Ceph Configuration

  • Ceph “Giant” release 0.87
  • Publicly available Ceph packages from
  • (4) Storage Nodes, each configured with
    • 2 OSD’s (SSD pool) = 2 x 800GB SSD
    • 12 OSD’s (HDD pool) = 12 x 1TB HDD + 2 x 800GB SSD
  • (4) RBD Clients
    • 1 x 200GB device (/dev/rbdX) (one small volume per client)
      • various pools, SSD, HDD, and HDD with Ceph writeback cache tiering, inflated prior to testing (not thin)
    • 1 x 4TB device (/dev/rbdY) (one large volume per client)
      • HDD pool with and without writeback cache tiering, inflated prior to testing (not thin)
  • (1) Monitor Node
  • Not required for ScaleIO
  • Three usually recommended for HA purposes, but some loss in performance for metadata syncing, so only one used for testing

ScaleIO Configuration

  • Used v1.31 of ScaleIO
  • (4) ScaleIO Storage Server Nodes (SDS), each configured with
    • 1 SSD Storage Pool = 2 x 800GB SSD
    • 1 HDD Storage Pool = 12 x 1TB HDD
      • HDD Storage Pool uses SSD storage pool for caching only in final tests
      • For the cases where SSD was used as cache, we waited for the cache to warm up and because the volume was smaller than cache, all the IOs made it to cache at the end
  • (4) ScaleIO Data Clients (SDC)
    • 1 x 200GB volume mapped to SSD pool
    • 1 x 4TB volume mapped to HDD pool

Summary Findings: ScaleIO vs. Ceph (IOPS)

As you can see from the following diagram, in terms of raw throughput, ScaleIO absolutely spanks Ceph, clocking in performance dramatically above that of Ceph [2].

Ceph testing-13-RLB-edits.014

In terms of latency, Ceph’s situation is much grimmer, with Ceph having incredibly poor latency, almost certainly due to their architecture compromises.  Even with SSDs, Ceph’s latency is worse than what you would expect from a single HDD (~7-10ms); moreover Ceph’s latency with SSD is actually worse than ScaleIO using HDD. This is unbelievably poor.  An SSD-based storage system should have latency in the ballpark of an SSD and even if there is some kind of performance penalty, it shouldn’t be more than 10x.

Ceph testing-13-RLB-edits.015

It’s About More Than Performance

There are other problems besides performance with a multi-purpose system.  The overhead I outlined above also means the system has a hard time being lean and mean.  Every new task or purpose it takes on includes overhead in terms of business logic, processing time, and resources consumed.  In most common configurations, ScaleIO, being purpose-built takes as little as 5% of the host system’s resources such as memory and CPU.  In most of our testing we found that Ceph takes significantly more, some times as much as 10x the resources as ScaleIO, making it a very poor choice for “hyperconverged” or semi-hyperconverged deployments.

This means that if you built two separate configurations of Ceph vs. ScaleIO that are designed to deliver the same performance levels, ScaleIO would have significantly better TCO, just factoring in the cost of the more expensive hardware required to support Ceph’s heavyweight footprint.  I will have more on this in a future blog posting, but again, it reinforces that purpose-built systems can be better not only on performance, but also on cost.

In the meantime, here’s the brief summary table I put together:

Ceph testing-13-RLB-edits.007

The numbers put together make the following basic assumptions:

  • Target capacity of 100TB usable storage
  • Use enough “storage servers” to meet capacity demands
  • Run the application on storage servers (hyperconverged)
  • Add additional application servers to meet apps processing and memory requirements as needed
    • This is the 12+11 and 8+5 numbers above
  • Ceph configured for “performance” workloads, so using replication instead of erasure encoding

In this configuration, ScaleIO takes 43% less servers, uses 29% less raw storage to achieve the target storage, and cost per usable TB is 34% less.  All of this while providing 6.5x the IOPS.

Obviously, the numbers here get a lot worse if you target total IOPS instead of usable capacity.

There will be a more in-depth blog posting looking at our calculator in-depth late August.  I will be happy to share the spreadsheet at that time as well.


This blog posting isn’t about “Ceph bad, ScaleIO good”, although it will certainly be misconstrued as such.  This is about killing Unicorns.  Unicorns are bad if people think they exist or worse when  they are being peddled by vendors.  Unicorns are good if they stay in the realm of platonic ideals and hopeful goals.  I will point you again to my previous white paper partially taking Ceph to task on their unified storage marketing message.  Google still backs up to tape.  Amazon has spinning disks and SSDs.

Information Technology (IT) is fundamentally about making tradeoffs.  There is no silver bullet and taking the blue pill will only lead to failure.  Real IT is about understanding these tradeoffs and making educated and informed decisions.  You may need a Swiss army knife.  Perhaps your requirements for performance are low.  Perhaps you are building smaller systems that doesn’t need much scalability.  Perhaps you don’t care about your total cost of ownership (TCO).

However, if you want to build a relatively low cost, high performance, distributed block storage system that supports bare metal, virtual machines, and containers, then you need something purpose built for block storage.  You need a system optimized for block, not a multi-tool.

If you haven’t already, check out ScaleIO, which is free to download and use at whatever size you want.  Run these tests yourself.  Report the results if you like. [3]

[1] Not only that, but if you are serious, you can’t get down to 1 storage system anyway.  Where will you back it up to?  You need tape.  Or at least another Ceph cluster.  The DreamHost team, who runs Ceph in production and who were the original authors of the codebase run two Ceph clusters, taking backup snapshots from one and storing them in the other cluster and vice versa.  So your minimum footprint is 2 storage systems, possibly 3 if you decide to build a unified management system as well.  C’est la vie?

[2] Here “Null” or “no disk limit” refers to performing a baseline test removing all physical disk I/O from the equation and providing some idea of the absolute performance of the software itself, including network overhead.  More here in the 3rd party ESG ScaleIO tests.

[3] The current ScaleIO EULA may preclude sharing this information publicly.  I’m working with the EMC legal team and the ScaleIO product team to have this fixed.  It looks like I will at least be able to get this done “with permission.”  Email me if you have any questions.

Posted in Cloud Computing | Leave a comment

A CoprHD Status Update

Posted on by Randy Bias

I wanted to provide you a big update on Project CoprHD and a mea culpa.  As many of you know, EMC launched CoprHD during EMC World 2015 and made the code generally available on June 5th.  Unfortunately, we are learning the hard way about proper follow through when open sourcing a project.  As you probably noticed, since June 5th, we have “gone dark” and there hasn’t been a lot of information.  Our bad.  That wasn’t intentional, but simply a lot of focus on our part on execution and not so much on communication.

So this blog posting is an attempt to get us back on track.  We’ll also be trying to communicate better on the mailing lists and project website.  We do believe in openness and transparency, we just aren’t very good at it yet.  :)

You can find out more about CoprHD at my previous blog postings: introducing CoprHD and CoprHD’s architecture.

CoprHD Update

One of the biggest problems to date is that we don’t have a clearly published timeline on the follow through to the original open sourcing of the project.  We’re working on providing more regular updates via the Google Group.  Meanwhile, here’s a quick list of what happened over the past 45 days and where we are headed in the future. This includes the big news that as of today (July 31st, 2015) we can accept outside pull requests!

  • Our development team is swapped over and developing off of the public repository
  • The Jira ticketing system is open, but you’ll have to create an account to create new tickets
  • July 31st: open for bidness (i.e. external contributions)
  • August 13th: open architectural discussion for CoprHD projects
  • April is next planned major release
  • Expect on a weekly basis every Friday for the next several months at least for a tutorial video to be posted.
    • These will be videos ranging from simply how to request your dev accounts and get CoprHD up and running to whiteboarding the architecture of CoprHD and walking through our directory structure

We also have some very exciting upcoming events:

  • August 17-19th: Join us at the Intel Developer Forum in San Francisco for a demonstration of CoprHD with Intel running CoprHD on Intel’s Rack Scale Architecture (RSA) and managing EMC’s distributed block storage software system, ScaleIO.
  • September 1st-3rd: CoprHD’s first ever developer meetup in Cambridge, MA; we are actively soliciting as many folks to come as possibly, particularly other storage vendors who want to help with this initiative.  Meet the core developers f2f, ask questions, and probably some focused “hacking” sessions, particularly on parts of the code that need to change to allow a “bigger tent”.


Again, a brief mea culpa.  We didn’t quite mean to take so long to give everyone an update.  We are super excited over here.  So far everything has been a success and I think we’ve been caught up in the headiness of it all.  Next step is to get as many more people on board as possible, particularly our fellow storage companies.

Posted in Cloud Computing | Leave a comment

Project CoprHD’s Architecture

Posted on by Randy Bias

Unless you had your head in the sand, you probably saw my blog post talking about Project CoprHD (“copperhead”), EMC’s first open source product. Exciting times are ahead when one of the world’s largest enterprise vendors embraces open source in a big way. Does it get any bigger than picking your flagship software-defined storage (SDS) controller and open sourcing it? But there is a bigger story here. The story of CoprHD itself and specifically its architecture. CoprHD is a modern application architecture from the so-called “third platform” or “cloud native” school of thought. You probably didn’t know this and you may even be a little curious what goes into CoprHD.

So I thought I would give you a basic overview on CoprHD and then a short comparison to OpenStack Cinder, which is the closest analog to CoprHD out there today. CoprHD is meant to work with Cinder, but in the same way that OpenDaylight plays nice with OpenStack Neutron (both “software-defined networking controllers”). There is overlap, but the current OpenStack cultural environment doesn’t really allow swapping out individual components without a lot of blowback, so most software-defined-* controllers simply integrate to existing OpenStack projects to reduce friction and encourage adoption. CoprHD is no different.

Let’s get to it.

Introduction to Software-Defined-* Controllers

SDS or SDN can seem a little confusing at first, but for me it’s easy. There are two parts, the control plane and the data plane:

SD-star Controllers.001

These two parts represent the two fundamental components of anything that is “software-defined”, which is programmability (“software-defined”) and the abstracted resources driven through the API (“virtualization” is the commonly used term, but I will reserve that for compute only).

Frequently, the control plane and data plane are separated, although there is a movement around “hyperconverged” to collapse them, which I have an issue with. When talking about SDS or SDN, the separated control plane is referred to as the “controller”. So you will see folks talk about “SDN controllers” in the networking space and they are referring to things like OpenStack Neutron, OpenDayLight, the OpenContrail Controller, and the VMware NSX Controller.

As you might infer, given the name “software-defined”, the control plane is one of the more critical components in an SDS architecture. To date there has been little or no SDS controller in the marketplace other than OpenStack Cinder and ViPR Controller. So the open sourcing of ViPR into Project CoprHD is extremely important.

An SDS controller is your primary tool for managing storage systems, be they legacy storage arrays or modern scale-out software-only storage solutions; and regardless of whether they are block, file, or object. SDS controllers should:

  1. reduce complexity
  2. decrease provisioning times
  3. provide greater visibility and transparency into the managed storage systems
  4. reduce cost of storage operations and management
  5. be architecturally scalable and extensible

This last item is what I really want to talk about today, because while CoprHD gives you #1-4, #5 is where it really shines. #5 is also a large part of why we chose to open source CoprHD.

CoprHD’s Cloud Native Architecture

Now that CoprHD is out you can check out the code yourself. We’re still in the process of getting all of the pieces together to take pull requests and move our documentation into the public. Something like this isn’t simple, but there is some initial documentation that I want to walk you through.

CoprHD’s design principles are:

  • CoprHD must be scalable and highly available from the very beginning
    • Scalability should not be an add-on feature
  • It must use Java on Linux [1]
    • A good server platform with good support for concurrency and scalability
  • Must use solid and popular open source components extensively
    • Allowing for focus on the real issues without reinventing the wheel
  • Must avoid redundant technologies as much as possible
    • Adding new technology often adds new problems
  • The system console must be for troubleshooting only
    • All operations must have corresponding REST calls

That seems like a solid set of initial design goals.

Let’s take a look at a typical CoprHD deployment:

Here we can see where CoprHD sits relative to the other components in a modern cloud or “software-defined datacenter” (SDDC).

In the box labeled CoprHD above, which represents a CoprHD Cluster (3 or 5 identical servers) are:

  • Reverse web proxy & load balancer (nginx)
  • Robust GUI framework (play framework)
  • REST API and GUI (EMC-written Java code)
  • SDS controller business logic (EMC-written Java code)
  • Distributed data store (Cassandra)
  • Distributed coordinator (Zookeeper)

Here is a more precise diagram showing more of the individual services each running on identical nodes in a 3-node cluster.

You can figure out the mapping of the components in the list above I gave you to the boxes above, I’m certain. Or refer to the formal documentation for more details.

Each CoprHD server is identical, using Cassandra to replicate data between nodes, Zookeeper to coordinate services across nodes, nginx as proxy and load-balancer, and VRRP for service failover.

I love this approach because it uses a bunch of well-baked technologies and doesn’t get overly experimental. VRRP and Zookeeper’s quorum are tried and true. Cassandra powers some of the world’s largest cloud services. Most importantly, this is clearly a “cloud native” pattern, using scale out techniques and distributed software. There are no single points of failure and you can run up to 5 servers according to the documentation, although I don’t see any inherent flaws here that would stop you from running 7, 9, 11, or more (always an odd number to make certain the Zookeeper quorum works).

CoprHD’s Pluggable Architecture

The documentation unfortunately isn’t up yet, but CoprHD supports managing a wide variety of traditional storage arrays and cloud storage software, including EMC hardware systems (VNX, VMAX, etc), EMC software systems (ScaleIO), NetApp, and Hitachi Data Systems. This means the CoprHD system and the controllersvc specifically have been designed to be extensible and have a pluggable architecture.

In addition, by integrating to OpenStack Cinder, CoprHD can support the management of additional hardware vendors. The major downside here, of course, is that it’s difficult to deploy Cinder by itself. You’ll need a fair bit of the rest of OpenStack in order to get it up and running.

CoprHD’s Scalable Control Interfaces: REST API & Non-Blocking Async GUI

One of the signatures of the third platform is that the control systems need to scale well. This is because systems are highly automated and the new API-centric model means that there will be large numbers of API calls. Related, mobile applications and interfaces take advantage of technologies such as WebSockets, to allow a more interactive GUI than before. Another interesting challenge is that for both the API and GUI, some interactions are inherently synchronous (you take an action and see an immediate result) and some are asynchronous (take an action and wait for a result to be sent back [push] or query regularly for a response [poll]).

CoprHD uses a standard shared nothing architecture and the only state resides in the Cassandra cluster. This means that the REST API can be run on all active nodes and effectively scale to the size of the number of nodes. The GUI was designed on top of the Play Framework, which is designed as a distributed non-blocking asynchronous GUI that is mobile friendly. This is the same GUI framework that powers massive websites like LinkedIn.

Combined, CoprHD’s built-in API and GUI are designed for true scale.

Cassandra as Distributed Data Store

Cassandra is covered elsewhere, but I always like to point to this slide I used to describe Netflix’s testing of it. What you see here is near perfect linear scale moving from 50 to 300 servers. And I think that says pretty much all you need to know about Cassandra scalability.

Patterns for Elastic Clouds - Deloitte.018

Comparing CoprHD vs. ScaleIO, Ceph, Cinder, and Manila

The last thing I think we need to do is a quick comparison against some of what are considered CoprHD’s contemporaries. I don’t really see them that way, but folks will draw parallels and I’d like to tackle those head on. Right off the bat we should be clear: CoprHD is designed to be an independent SDS controller, whereas the rest of these tools are not. CoprHD is also designed to be vendor neutral and designed for file/object/block.

So here’s a quick comparison chart to make things clear. This comparison is purely about the SDS controller aspects of these technologies, not about storage capabilities themselves.

SD-star Controllers.002

I am not going to spend a lot of time on this chart. Some may argue about its contents, but probably any such arguments will come down to semantics. The takeaway here is that CoprHD is the only true scale-out pure-play SDS controller out there that is open source, vendor neutral, and designed for extensibility.

Summing Up CoprHD

CoprHD is written for the third platform and designed for horizontal scalability. It can act as a bridge to manage both legacy “second platform” storage systems and more modern “third platform” storage systems. It is open source and is building a community of like-minded individuals. CoprHD delivers on the criteria required to be a successful SDS controller and has an architecture that is scalable and future-proofed. It should be at the center of any SDS strategy you are designing for your private or public cloud.

[1] Some folks are allergic to Java. I’m not a fan, but I’ve never figured out why they take issue other than it’s verbosity. C++ isn’t exactly terse. For those critical of Java’s place in the “third platform”, you should be aware that large chunks of Amazon Web Services (AWS) are written in Java. It’s still one of the most robust language virtual machines out there and that’s why next generation languages like Scala and Clojure run on top of it, eh?


Posted in Cloud Computing | Leave a comment

EMC and Canonical expand OpenStack Partnership

Posted on by Randy Bias

As you saw at last week’s OpenStack Summit, EMC® is expanding its partnership with Canonical amongst others. I want to take a moment to talk specifically about our relationship with Canonical. We see it as a team up between the world’s #1 storage provider and the world’s #1 cloud Linux distribution.

For the last two years, EMC has been a part of Canonical’s Cloud Partner Program and OpenStack Interoperability Lab (OIL). During this time EMC created a new Juju Charm for EMC VNX technology. This enables deployment by Canonical’s Juju modeling   software. This past week, we specifically announced the availability of a new OpenStack solution with Ubuntu OpenStack and Canonical as part of the Reference Architecture Program announced last November in Paris. The solution is built in close collaboration with Canonical in EMC labs then tested, optimized, and certified.

Cloud workloads are driving storage requirements, making it a crucial part of any OpenStack deployment. Companies look for scalable systems that leverage features of advanced enterprise storage while also avoiding complexity. EMC and Canonical created an easily modeled and reference architecture using EMC storage platforms (VNX® and EMC XtremIO™), Ubuntu OpenStack and Juju. This allows for repeatable and automated cloud deployments.

According to the OpenStack User Survey, 55% of production clouds today run on Ubuntu. Many of these deployments have stringent requirements for enterprise quality storage. EMC and Canonical together fulfill these requirements by providing a reference architecture combining the world’s #1 storage, #1 cloud Linux distribution, and tools for repeatable automated deployments.

We will be releasing an XtremIO (our all flash array) Charm and eventually ScaleIO (our software-only distributed block storage) as well. ScaleIO is a member of EMC’s Software Defined Storage portfolio, has been proven at massive scale, and is a great alternative to Ceph. You will soon be able to download a free, unsupported and unlimited version of ScaleIO to evaluate yourself.  Look for these products and others, such as ViPR Controller, to be available in Canonical’s Charm Store and through Canonical’s Autopilot OpenStack deployment software later this year.

This work is in support of eventually making all of EMC’s storage solutions available via OpenStack drivers available for use with Ubuntu OpenStack. Given the wide acceptance of Ubuntu with the OpenStack community, EMC will use Ubuntu internally and in future products. We believe that these efforts coupled with the quality professional services and support customers have come to expect from us will help give enterprise customers peace of mind. This will accelerate adoption of OpenStack Cloud solutions in the enterprise.

With EMC storage and Canonical solutions, customers realize these benefits:

  • A repeatable deployable cloud infrastructure
  • Reduced operating costs
  • Compatibility with multiple hardware and software vendors
  • Advanced storage features only found with enterprise storage solutions

Our  reference architecture takes the Ubuntu OpenStack distribution, and combines it with EMC VNX or XtremIO arrays, and Brocade 6510 switches. Automated with Juju, the time to production for OpenStack is dramatically reduced.

The solution for Canonical can be found at this link and a brief video with John Zannos can be found here on EMCTV. The EMC and Canonical  architecture is below for your perusal.

EMC and Canonical Ubuntu OpenStack Reference Architecture

This reference architecture underscores EMC commitment to providing customers choice. EMC customers can now choose to build an Ubuntu OpenStack cloud based on EMC storage, and use Juju for deployment automation.

It’s an exciting time for Stackers as the community and customers continue to demand reference architectures, repeatable processes, and support for existing and future enterprise storage systems.

Posted in OpenStack | Leave a comment

State of the Stack v4 – OpenStack In All It’s Glory

Posted on by Randy Bias

Yesterday I gave the seminal State of the Stack presentation at the OpenStack Summit.  This is the 4th major iteration of the deck.  This particular version took a very different direction for several reasons:

  1. Most of the audience is well steeped in OpenStack and providing the normal “speeds and feeds” seemed pedantic
  2. There were critical unaddressed issues in the community that I felt needed to be called out
  3. It seemed to me that the situation was becoming more urgent and I needed to be more direct than usual (yes, that *is* possible…)

There are two forms you can consume this in: the slideshare and the YouTube video from the summit.  I recommend the video first and then the Slideshare.  The reason being, that with the video I provide a great deal of additional color, assuming you can keep up with my rapid fire delivery.  Color in this case can be construed several different ways.

I hope you enjoy. If you do, please distribute widely via twitter, email, etc. :)

The video:

The Slideshare:

State of the Stack v4 – OpenStack in All It's Glory from Randy Bias

Posted in OpenStack | Leave a comment

OpenStack Self-Improvement Mini-Survey

Posted on by Randy Bias

Want to help make OpenStack great?  I’ve put together a very quick survey to get some deeper feedback than is provided by the User Survey.  The intention is to provide some additional information around the State of the Stack v4 I’m giving next week at the summit.

I will really appreciate it if you take the 2-3 minutes out of your day to answer this honestly.

Click here for the survey.

UPDATE: Horizon was missing from the survey and I have added it.  Heartfelt apologies to all of the Horizon committers.  An honest mistake on my part.  OpenStack is almost too big to keep track of.  :)

Posted in OpenStack | Leave a comment

Introducing CoprHD (“copperhead”), the Cornerstone of a Software-Defined Future

Posted on by Randy Bias

You’ve probably been wondering what I’ve been working on post-acquisition and yesterday you saw some of the fruits of my (and many others) labor in the CoprHD announcement.  CoprHD, pronounced “copperhead” like the snake, is EMC’s first ever open source product.  That EMC would announce open sourcing a product is probably as big a surprise to many EMCers as it may be to you, but more importantly it’s a sign of the times.  It’s a sign of where customers want to take the market.  It’s also the sign of a company willing to disrupt itself and it’s own thinking.

This is not your father’s EMC.  This is a new EMC and I hope that CoprHD, a core storage technology based on EMC’s existing ViPR Controller product, will show you that we are very serious about this initiative.  It’s not a me too move.

This move is partly in direct response to enterprise customer requests and our own assessment of where the market is headed.  Perhaps more importantly, this move drives freedom of choice and the maintenance of control on the part of our customers.  Any community member (partner, customer, competitor) is free to add support for any storage system.  CoprHD is central to a vendor neutral SDS controller strategy.

For those of you not familiar with ViPR Controller, it is a “software-defined storage” (SDS) controller, much like OpenDaylight is a software-defined networking (SDN) controller.  This means that ViPR can control and manage a variety of storage platforms and, in fact, today it is already multi-vendor, supporting not only EMC, but NetApp, Hitachi, and many others.  ViPR Controller has REST APIs, ability to integrate to OpenStack Cinder APIs, a pluggable backend, and is truly the only software stack I’ve seen out there that fulfills the hopes and dreams of a true SDS controller by not only providing heterogeneous storage management but also metering, a storage service catalog, resource pooling, and much, much more.

CoprHD is the open source version of ViPR Controller.  A comparison:

Comparing CoprHD vs ViPR Controller.001

What is “Non-essential, EMC-specific code”?  In this case, it’s simply the part of the code that enables “phone home” support to EMC, which has no relevance to users of CoprHD’s SDS services with non-EMC data stores.  CoprHD is in every way ViPR Controller and the two are interchangeable, delivering on the promise of vendor neutrality and providing customers control, choice, and community.  A quick caveat: please be aware that at this time, although this is the same code base and APIs, a clean installation is required to convert CoprHD to ViPR Controller or vice versa.  There is no “upgrade” process and it’s unclear that it ever makes sense to create one, although we might eventually create a migration tool depending on customer demand for one.

The rest of this blog post seeks to answer the key questions many have about this initiative:

  • Why ViPR Controller?
  • Why now?
  • Why would EMC do this?

Exciting times.  Let’s walk through it!

The Emerging Strategy for Enterprise: Open Source First

More and more we’re seeing from customers that proprietary software solutions have to be justified.  Today, the default is to first use open source software and open APIs to solve 80% of the problem and only to move to proprietary software when it is truly required.  This reflects the growing awareness of traditional enterprises that it is in their best interests to maintain control of their IT capabilities, reduce costs and increase agility.  This strategy, of course, mirrors what newer webscale enterprises such as Amazon and Google already knew.  Webscale players have been estimated to be as much as 80-90% open source software internally, compared to traditional enterprises which can be closer to 20-30% [1].

We heard from many enterprise customers that they were reluctant to adopt ViPR Controller, despite it being proven in production, simply because it was not open source.  No one wants “lock-in”, by which what they really mean is they desire vendor neutrality and maintaining control.

Businesses also want to know that not only could they switch vendors for support of an open source project, but perhaps more importantly, that they could directly affect the prioritization of roadmap features, by providing their own development resources or hiring outside engineering firms.

Finally, in any open source first strategy is the need and desire to have like-minded consumers of the same project around the table.  Businesses want to know that others like them are close by and available in public forums such as bug tracking systems, code review tools, and Internet Relay Chat (IRC).

This then is the “control” provided by an open source first strategy:

  1. Vendor neutrality and choice of support options
  2. Direct influence and contribution to the roadmap
  3. Ability to engage with like-minded businesses through public forums

You’ll probably notice that none of these equate to “free”.  Nowhere in our dialogues with customers has there been an overt focus on free software.  Certainly every business wants to cut costs, but all are willing to pay for value.

EMC Puts Customers and Business Outcomes First

EMC is renowned for being the world’s leader in storage technology, but more than a storage business, EMC is an information management business.  We put a premium on helping customers succeed even when that means that there may be an impact to our business.  If you look at today’s EMC, it is organized in such a way that an entire division, Emerging Technologies Division, is dedicated to disrupting the old way of doing things.  Software-only technologies such as ScaleIO, ViPR, and ECS (the non-appliance version) exist here.  Software that can run on anyone’s hardware, not just EMC’s.  All-flash technologies like XtremIO were birthed here.  ETD has led EMC’s community development with EMC{code} and is also leading the way in helping EMC become more involved with open source initiatives and delivering open source distributions of some of its products.

Our product strategy is to meet the customer where they are at and to be “flexible on the plan, while firm on the long term mission.”  Our broader strategy is to drive standardization and clarity in the industry around “Software-Defined Storage” (SDS), to help establish open and standard APIs, and to ease the management of storage through automation and vendor neutral management systems.  This means continually evolving and adjusting our business and our products.  It also implies a need to do more than storage (hence Emerging Technologies and not Emerging Storage Technologies Division) but more on that at a later date.

Achieving this vision requires leadership and forethought.  CoprHD is a sign of our willingness to go the distance, adapt and change, and disrupt ourselves.  Software-defined infrastructure and software-defined datacenters are a critical part of EMC II’s future and CoprHD is vital to enabling the SDS layer of any SDDC future.

CoprHD Is Leading The Way in SDS

Make no doubt, CoprHD (code available in June) is leading the way in SDS.  EMC welcomes everyone who wants to participate and we have already heard from customers who will ask their vendors to come to the party by adding support for their product to the open source project.  A truly software-defined future awaits and EMC is using its deep storage roots and focus on software to deliver on that future.

Again, this is NOT your father’s EMC.  This is a new EMC.

Thank-Yous Are In Order

Finally, although I acted as a “lightning rod” to drive organizational change, I mostly educated, where others acted.  I want to thank a number of EMCers without whom the CoprHD open source project simply wouldn’t have happened.  A short and incomplete list of amazing people who made this possible follows:

  • Jeremy Burton: executive buy-in and sponsorship
  • Manuvir Das: engineering leadership
  • Salvatore DeSimone: architecture, thought-leadership, and co-educator
  • James Lally: project management
  • The entire ViPR Controller software team for being willing to make a change
  • Intel for stepping up and helping us become a better open source company
  • Canonical for validating our direction and intentions
  • EMC{code} team for encouragement and feedback


[1] An estimate from my friends at Black Duck Software.

Posted in Cloud Computing | Leave a comment

What AWS Revenues Mean for Public Cloud and OpenStack More Generally

Posted on by Randy Bias

At the risk of sounding like “I told you so”, I wanted to comment on the recent Amazon 10-Q report.  If you were paying attention you likely saw it as it was the first time that AWS revenues were reported broken out from the rest of, ending years of speculation on revenue. The net of it is that AWS revenues for Q1 2015 were 1.566B, putting it on a run rate of just over 6B this year, which is almost on the money for what I predicted at the 2011 Cloud Connect keynote I gave [ VIDEO, SLIDES ]. Predictions in cloud pundit land are tricky as we’re usually about as often wrong as we are right; however, I do find it somewhat gratifying to have had this particular prediction correct and I will explain why shortly.

The 2015 Q1 AWS 10-Q

If you don’t want to wade through the 10-Q, there are choice pieces in here that are quite fascinating.  For example as pointed out here AWS is actually the fastest growing segment of Amazon by a long shot.  It is also the most profitable in terms of gross margin according to the 10-Q.  I remember having problems convincing people that AWS was operating at a significant profit over the last 5 years, but here it is laid out in plain black and white numbers.

Other interesting highlights include:

  • Growth from Q1 2014 -> Q1 2015 is 50% y/o/y, matching my original numbers of 100% y/o/y growth in the early days scaling down to 50% in 2015/2016
  • Goodwill + acquisitions is 760M, more than that spent on (retail) internationally and a third of what is spent on in North America
  • 1.1B spent in Q1 2015 “majority of which is to support AWS and additional capacity to support our fulfillment operations”
  • AWS y/o/y growth is 49% compared to 24% for in North America and AWS accounts for 7% of ALL Amazon sales

Here is a choice bit from the 10-Q:

Property and equipment acquired under capital leases were $954 million and $716 million during Q1 2015 and Q1 2014. This reflects additional investments in support of continued business growth primarily due to investments in technology infrastructure for AWS. We expect this trend to continue over time.

The AWS Public Cloud is Here to Stay

I’ve always been bullish on public cloud and I think these numbers reinforce that it’s potentially a massively disruptive business model. Similarly, I’ve been disappointed that there has been considerable knee-jerk resistance to looking at AWS as a partner, particularly in OpenStack land [1].

What does it mean now that we can all agree that AWS has built something fundamentally new?  A single business comparable to all the rest of the U.S. hosting market combined?  A business focused almost exclusively on net new “platform 3” applications that is growing at an unprecedented pace?

It means we need to get serious about public and hybrid cloud. It means that OpenStack needs to view AWS as a partner and that we need to get serious about the AWS APIs.  It means we should also be looking closely at the Azure APIs, given it appears to be the second runner-up.

As the speculation ceases, let’s remember, this is about creating a whole new market segment, not about making incremental improvements to something we’ve done before.

[1] If you haven’t yet, make sure to check out the latest release we cut of the AWS APIs for OpenStack

Posted in Cloud Computing, OpenStack | Leave a comment


← Older posts