My Endorsements for the 2015 OpenStack Individual Director Elections

Posted on by Randy Bias

If you are voting this year in the individual director elections, and I sincerely hope you are, I would appreciate it if you would give special consideration to the following candidates and a super brief “why”

  • Kavit Munshi – international and Indian representation
  • Tim Bell – user representative and continuity with user committee and user survey
  • Jesse Proudman – operator representative and independent voice
  • Haiying Wang – international and Chinese representation

There are many other fantastic candidates running including Monty Taylor, Rob Hirschfeld, Alex Freedland, Sean Winn, and Ken Hui.  However, I decided to cut this down to a very short list that was stack ranked as follows:

  • International representation (we need more)
  • User representation (we need more)
  • Operator representation (we need more)

Good luck to everyone.


Posted in OpenStack | Leave a comment

The Future of OpenStack is Now, 2015

Posted on by Randy Bias

This year will be a crucial year in OpenStack history.  This is the year we fix much of how OpenStack is structured or die trying.  By structure, I mean the vision, the project structure, the integrated release cycle, and the board and TC’s role in driving direction.

First, stop and read this blog posting by Thierry Carrez, release manager for OpenStack.  Then, if you haven’t already, make certain you watch my related keynote (preso w/no video is here) from the OpenStackSV meeting in September of last year.

The problem can be stated simply, if somewhat brutally: OpenStack is at risk of collapsing under its own weight.

The original vision had the OpenStack community delivering a tightly integrated release focused on basic infrastructure services on a 6-month release cycle.  The problem is that this shared vision was at odds with two things: 1) the inherent inclusivity of the OpenStack community and 2) people’s wildly differing interpretations of the word “cloud”.  To be honest the latter is endemic to the cloud space and has been since its inception, but the mission statement of OpenStack doesn’t clarify, instead using the hopelessly abuse word “cloud”.

Let’s examine these challenges I outlined and examine a way forward, ending with a plea from myself on how you can help.


When OpenStack launched in summer of 2010, I and many others saw the immediate value of an open source infrastructure-as-a-service stack with a vibrant community.  Something Eucalyptus and CloudStack had both failed to achieve.  In fact, the very hallmark of OpenStack was its inclusivity.  If you joined the community, played by the rules, and wanted to make something happen, it was clear how to do so and you were actively encouraged to go for it.  This is an important facet of why OpenStack grew so fast and had so many amazing participants.

However, there was one dark spot in this inclusivity.  Namely, competing OpenStack projects were actively discouraged as were adding projects written predominantly in non-Python languages.  The attitude in the former case was one of “why don’t you fix what is already broken?” and in the latter, one of “we want to allow developers to move easily between projects!”  Both laudable goals, but both ultimately unwieldy and misguided.

Although unspoken, another source of this tension was the process by which the “integrated release” was delivered every 6 months, where in theory we:

  1. release code
  2. have a summit where we discuss the next major release
  3. work for months including mid-cycle meetups to get new code ready
  4. test, test, and re-test all of the code together
  5. release code and begin again

Now with only two projects, Nova and Swift, in the beginning, this was not a problem, but as the number of projects grew, significant organizational issues began to arise.  Thierry did the most eloquent job explaining so I am just going to try and fill in the gaps and talk about this more from a product and business perspective.

The issues with the growth in projects was deeply compounded by many of the new projects being “cloud” but entirely different areas of cloud, such as Platform-as-a-Service, Database-as-a-Service, etc.  Many of these don’t need to be part of an integrated test release every 6 months and in fact should probably be developed on their product cycles

All together this meant several things:

  • The integrated code cycle demanded a rethink
  • The importance of delivering a tightly integrated release was in question
  • OpenStack as a single monolithic “cloud operating system” was clearly untenable
  • The idea that developers could move seamlessly between projects was dubious
  • Delivering inclusivity is probably the “killer app” for OpenStack and its Foundation

OpenStack by Design

In my OpenStackSV Keynote, Lie of the Benevolent Dictator, I highlighted what I saw as a critical gap in the organizational structure of the community.  Namely that we needed real product management and product strategy leadership.  During the 2014 Atlanta Spring Summit, the Board and the Technical Committee had their first joint session.  From that meeting it became clear that the TC was focused on managing the 6-month integrated release cycle and was focused exclusively on tactics.  At the same time, the Board and Foundation did not feel that they had the remit to drive technical product requirements.

The result is that there is a lack of cohesive long term (2-5 year) planning around OpenStack from a product perspective.  Instead, we rely on the the grassroots level organization that may or may not happen as each developer or company contributes code.  In effect, we suffer from the Tragedy of the Commons.

We asked for this when we encouraged an inclusive environment and I certainly don’t want to do away with a key strength of OpenStack; however, the bent for inclusivity needs to be tempered with better long term product planning.

We need OpenStack by Design and Intent, not by accident.

OpenStack’s Way Forward

Again, Thierry’s extremely eloquent outline of how to move forward from a technical point of view is fantastic, but perhaps there is room for improvement?  If inclusivity and the community is the most important aspect of OpenStack, then perhaps we should operate as such.

I believe there are a number of key items that need to happen this year at the Board, TC, and Foundation level.  Fundamentally, we need to look more like a set of loosely-coupled independent projects that MAY be put together in a variety of ways. [1]

These are the key items that need to addressed this year:

  • Reorganize to a more scalable model, like the Apache Software Foundation
  • Discard the integrated release process, in favor of interoperability testing
  • Promote DefCore and the CI system for delivering interoperability between projects
  • Explicitly encourage non-Python OpenStack projects
  • Re-position OpenStack in the minds of the market and community
  • Create an ongoing educational process to help reinforce this re-positioning
  • Develop “integration streams” for interrelated OpenStack projects that need interop
  • Re-imagine the TC as an integration and architecture team not SDLC management
  • Plug product management kung fu into the TC

Your Help is Necessary To Enable This Vision in 2015

I have been working behind the scenes, along with many other board and TC members, to help educate on these issues.  I worked with the DefCore and RefStack teams, helping to encourage formation of the product management working groups, and bringing up key issues at board meetings.  I believe that our collective efforts helped us get to the point where change is possible and Thierry’s article shows that the appetite and willingness to change is here.

I want to continue representing the community as a whole on the OpenStack Foundation Board of Directors.  I know that I can represent your interests and help guide OpenStack down the right path.  This year is a formative year for OpenStack and I know that my particular flair for breaking the glass will be critical in encouraging change.

For the first time I’m running as an individual representative who wants to create the best OpenStack possible for everyone.  I am focusing on inclusivity, revitalizing the OpenStack community and process, and driving towards a model that ultimately is the best for vendors, customers, end-users, developers, operators, and all other stakeholders within OpenStack.

I want your vote!  Thank you.



[1] Hopefully this will get rid of the banal requests from unwitting customers for “vanilla OpenStack”, something that has never existed anyway.

Posted in OpenStack | Leave a comment

The EMC Federation Joins the OpenStack Foundation

Posted on by Randy Bias

Last week a major set of milestones was reached for the EMC Federation’s involvement with OpenStack. First, EMC and it’s affiliated companies and brands (VMware, VCE, Pivotal, RSA, Cloudscaling) determined a cohesive strategy for engagement with the OpenStack Foundation Board. Second, EMC appointed a VMware employee, Sean Roberts (@sarob), as the official representative of EMC and hence the EMC Federation generally. This means that I am no longer the EMC (Cloudscaling) OpenStack Foundation Gold Director.

The why of this may be confusing so I will briefly explain the background and then provide some more details on what exactly transpired.

By and large the OpenStack bylaws have stood the test of time quite well at this point. Most of the upcoming proposed changes are simply things we could only have known in hindsight. One area that I think the bylaws got right are the articles that limit participation by “Affiliated” companies:

2.5 Affiliation Limits. Gold Members and Platinum Members may not belong to an Affiliated Group. An Affiliated Group means that for Members that are business entities, one entity is “Controlled” by the other entity. “Controlled” or “Control” means one entity owns, directly or indirectly, more than 50% of the voting securities of the Controlled entity which vote for the election of the board of directors or other managing body of an entity, or which is under common control with the Controlled entity. An Affiliated Group does not apply to government agencies, academic institutions or individuals.

What this means, in essence, is that if there are two companies with a relationship like parent/child or joint venture, in which one owns more than 50% of the other, only ONE of the companies can join the OpenStack Foundation as a Gold or Platinum Member. This is a good measure to prevent a group of companies from “stacking the deck” within the OpenStack Foundation and using that as leverage to control or dominate OpenStack, which is something no one wants. I also need to note that any company may also have one to two Individual Members represent them. Two Directors from any single affiliated group is the maximum representation on the OpenStack Board of Directors. This works out to one Gold or Platinum Director plus one Individual Director OR two Individual Directors. This is why I am allowed to run as an Individual Director in 2015. Of course, I would very much appreciate your support in this endeavour!

So, things became very interesting upon EMC’s acquisition of Cloudscaling as it inherited the Gold Member status of Cloudscaling while VMware also retained their Gold Member status, creating an edge case where the Bylaws were technically in violation. This required EMC and VMware to work closely with the Foundation staff to resolve the situation.

This is why VMware resigned their Gold Member status and why EMC appointed a VMware employee as a representative for EMC and hence the EMC Federation.

Which means we should quickly explain what the EMC Federation is.

EMC Federation
The EMC Federation is composed of a number of different entities, from security companies, to storage, to Platform-as-a-Service, big data, virtualization, converged infrastructure, and now OpenStack via the Cloudscaling acquisition. Members of the EMC Federation are already representatives on the OpenStack Foundation Board of Directors, OpenStack Foundation Gold Members, OpenStack Foundation Corporate Sponsors, and have deepening ties to OpenStack generally.

In April of 2013, EMC and VMware launched Pivotal and created a federation of its businesses. EMC is the majority owner, by a large margin of VMware, Pivotal, and RSA is a wholly owned subsidiary. Recently, VCE, the leader in converged infrastructure joined the Federation. Federation messaging and joint solutions were prominent during EMC World 2014. The following diagram gives you some idea of how the Federation is organized.


When asked about why the Federation model is needed and what differentiates the companies from competitors, the answer is “choice”. While VMware is the leading hypervisor, EMC also desires the opportunity to forge alliances and solutions with Microsoft, Citrix, and others. Conversely, VMware desires to support and work with a variety of storage and security solutions.

Similarly, members of the Federation desire to operate and support OpenStack’s mission in different manners (converged infrastructure, appliance models, and software distributions) while also supporting the joint goals of empowering and promoting OpenStack within the enterprise.

Wikibon covers the EMC Federation Model extensively here:

The EMC Federation OpenStack Strategy
As a group, the EMC Federation strongly desires to play by the rules of the OpenStack community, while deepening our commitments and contributions. As a group we are already a #6 contributor to the latest release and we aspire to go even further. OpenStack is a critical strategy for the Federation as a whole, even for members like Pivotal who see a significant increase in the number of enterprises who wish to run CloudFoundry on top of OpenStack.

What this meant for us when resolving the Bylaws issue is that we wanted to have the entire EMC group represented as a whole, such that others like VMware, VCE, and Pivotal, could all be a part of the picture. The Bylaws however require that the Gold Member selected is an actual legal entity.

Our final resolution was then to have VMware resign their Gold Membership, EMC retains the Cloudscaling Gold Membership, and in order to show EMC Federation coordination, EMC is appointing Sean Roberts to represent EMC, and hence the entire Federation, as our Gold Member representative. Finally, all of the branding on the OpenStack Foundation website will be a Federation-oriented branding (EMC2).

Meanwhile, behind the scenes, I’m working closely with Sean Roberts of VMware, Josh McKenty of Pivotal, Jay Cuthrell of VCE, and others to make sure that we have cohesion across the Federation.

Hopefully this helps explain these recent changes.

Posted in OpenStack | Leave a comment

An OpenStack Dream Team: EMC + Cloudscaling

Posted on by Randy Bias

If you’re following the buzz surrounding the EMC acquisition of Cloudscaling, you might wonder:

Is this a mismatch, or am I missing something?

Yes. You’re missing something. Let me explain.

First, you’ll want to take a closer look at the announcements by EMC today [1]. We will join the EMC Emerging Technologies Division led by CJ Desai. Cloudscaling OCS is now a core part of EMC’s Enterprise Hybrid Cloud Solutions. The Enterprise Hybrid Cloud Solutions powered by Openstack will be available in 2015. But this only paints half of the bigger picture to help you understand why this is a match made in heaven.

The other half of the backstory can be found in the Cloudscaling blog and in a few of my more notable presentations. Here is the synopsis:

  • Cloud computing is NOT virtualization on demand
  • Cloud computing was created by web-scale pioneers like Google and AWS
  • Cloud computing is a completely new kind of computing, fundamentally different from legacy enterprise computing

A couple of solid resources that will help deepen your understanding of these three points can be found in this presentation for NIST and in an older interview with Adrian Cockroft of Netflix.

This view of the world is the cornerstone of Cloudscaling’s product strategy. Essentially, we believe that two kinds of clouds are needed for two different types of applications:

Cloudscaling OCS Product Deck - 2014-08-17 - DRAFT.083

Yes, part of this was driven by pragmatism. VMware is king of the hill in enterprise virtualization and it’s hard to imagine a universe where that changes. But once nascent “cloud-native” applications are emerging very rapidly. Cloud-native apps manage their own availability and uptime, and they’re designed for scale-out with minimal or zero human engagement. This is where DevOps comes in. It’s the primary vehicle by which enterprises can successfully build cloud-native applications. It also brings into focus the prevalent “pets vs. cattle” meme that describes in a nutshell what a cloud-native application is: treating servers as disposable field replaceable units (FRUs) rather than critical pieces of infrastructure that must never fail.

EMC’s strategy is consistent with this approach to infrastructure and applications. They use a term that IDC coined called “third platform apps” to describe these new cloud-native applications. The speed at which the third platform is growing is almost unprecedented:

6 Requirements for Enterprise-grade OpenStack Supporting Material.003

We didn’t realize it during our first conversations back in 2012, but EMC and Cloudscaling were slowly moving towards each other, even though we didn’t quite see eye to eye on how to get there. EMC made some impressive moves in pursuing its scale-out architecture portfolio, including the acquisitions of Isilon and ScaleIO, and internal developments such as ViPR.

Through our interactions with EMC it became clear to our leadership team that EMC was more closely aligned philosophically to Cloudscaling than anyone else.

EMC Understands Disruption
Smart companies disrupt themselves. Apple and Amazon get this. EMC does, too.

This becomes clear when you look at EMC’s philosophy of purchasing companies like VMware and Greenplum, spinning off Pivotal, and the acquisition of their own all flash arrays with XtremIO. It demonstrates an appetite and willingness to take on risk and move boldly into the future.

It’s here that you see proof of EMC’s belief in the rise of “third platform” or “cloud-native” or “scale-out” applications, three concepts for describing the same phenomenon. Cloudscaling’s delivery of the first enterprise-grade OpenStack-powered cloud operating system with Open Cloud System, surely did not go unnoticed. EMC saw the value in our shared vision and that a deep collaboration could mean great things.

Delivering on Greatness the EMC Way
As the Innovator’s Dilemma illustrates, a well-run company could pay too much attention to existing customers at the expense of identifying, adopting, and nurturing new technologies that new market entrants could use to disrupt existing business lines. EMC knows this, and it’s planning for it.

Look at the recent reorganization of all of EMC’s scale-out technologies under CJ Desai in the Emerging Technologies Division (ETD). This group now includes:

  • XtremIO, the leader in all flash arrays
  • ViPR, software defined storage
  • ScaleIO, scale-out block storage
  • Atmos, scale-out object storage
  • Cloudscaling, scale-out cloud operating system powered by OpenStack

What you will notice about ETD is that it is fundamentally about incubating and delivering new technologies that are potentially disruptive to the existing EMC product lines.

For these reasons, Cloudscaling finds itself in excellent company.

The Cloudscaling and EMC Dream Team
This is why EMC+Cloudscaling makes sense. Both companies are planning for cloud-native apps to be embraced by the enterprise. And OpenStack will be key to delivering the infrastructure to support these apps.

The mission, vision and go-to-market execution proof-points driving EMC toward cloud computing are perfectly aligned with Cloudscaling. The quality of EMC’s leadership team and the company’s commitment to making things happen impress me every day.

Here’s to the future. It’s going to be bright!





—Randy Bias

[1] Also make sure to check out Ken Hui’s blog posting What EMC is up to with OpenStack solutions

Posted in Cloud Computing | Leave a comment

Public Cloud Economies of (Web-)Scale Aren’t About Buying Power

Posted on by Randy Bias

As you no doubt heard this week, Rackspace has announced the intention to focus on managed cloud.  Inevitably this brought observations from many about RAX, and others, ability to compete effectively against the web scale public cloud giants: Amazon, Microsoft, and Google. One of the commenters was Mike Kavis (twitter link), a long time cloud pundit and someone who’s opinion I respect. Mike wrote up a fairly interesting article that he posted on Forbes that I encourage you to read in full. Unfortunately, Mike falls into one of the older cloud tropes that I thought was well and truly dead. Today I seek to clarify and hopefully amplify much of what he said.

First, we need to address the so-called “economies of scale” that large public cloud providers enjoy. Simply put, economies of scale are structural cost advantages that come from sufficient size, greater speed, enhanced productivity, or scale of operation. Unfortunately, many folks, including Mike, fall into the trap of assuming that “economies of scale” == “buying power”. Buying power can be an element of achieving scale, but it is seldom a structural or sustainable advantage, certainly not against other large businesses who can command similar quantities of capital.

No, the real economies of scale that are relevant here are the tremendous investments in R&D that have led to technological innovations that directly impact the cost structures of Amazon Web Services, Google Cloud Platform, and Microsoft Azure. Here are some examples of what I mean:

This is just the first three items that are A) public and B) come to my mind without a lot of additional research. There are hundreds of other innovations. Most of these innovations share a central theme: reduction of cost through greater efficiency or being able to deploy lower cost hardware. For example, Google’s G-Scale Network uses inexpensive Taiwanese ODM switches.

As I have mentioned previously, the rate of innovation and development at these public clouds is where the true economies of scale reside.

Much of this is alluded to when Mike covers the level of investment in infrastructure and R&D from Amazon, Microsoft, and Google.  Unfortunately Mike mixes infrastructure investment (CapEx) with R&D investment (OpEx).  We actually don’t know what the levels of R&D investment are at the big three, although we know that they literally have thousands of developers at each working diligently on new capabilities.

And this is where we go off the rails because Mike throws IBM’s hat in the ring as a real contender because they are planning to invest $1.2B in new datacenters. This is actually uninteresting and mostly irrelevant when it comes to measuring the probability of success in the public cloud game. New datacenters and hardware won’t provide a true structural cost advantage. That can only come through investment in R&D and a proven track record of innovation in public cloud, neither of which IBM is clearly succeeding in. Perhaps they will and perhaps they are a true public cloud contender, but it’s hard for me to see that given that so much of this is about a cultural and organizational structure that can encourage innovation.

What does it take to change? As many of you know, I poo-pooed Microsoft’s chances for quite a while because I had no belief in their eventual success at delivering online services. Mostly because I felt the organization as a whole struggled with the operating system boat anchor and couldn’t let go. Of course this was before Satya Nadella took over the helm and declared a focus on cloud and mobile. In essence he empowered and enabled the Online/Live  teams to become the new Microsoft. The Live teams have learned “web scale” the hard way, over many many years and through the spilling of much red ink. See this article from 2011 on MSFT’s Online services operating income.  Microsoft has spent 10 years and many billions of dollars to become a credible player against the likes of Amazon and Google.

In that light, how can anyone else even pretend to the throne without similar levels of investment? Buying a hosting company is not going to get you there. This isn’t a game of buying power or outsourcing. It’s an innovation game and that’s it. The number of players who can pull this off are vanishingly small.

You want “economies of scale?” You’re going to pay for it and at this point it’s probably too little too late.

Posted in Cloud Computing | Leave a comment

Voting for the Fall OpenStack Summit in Paris, France is now open!

Posted on by Randy Bias

The Cloudscaling team has, once again, submitted an outstanding array of talks and we would all appreciate if you took the time to vote for our presentation submissions. We’ve taken the time to summarize our presentations for you below, as well as provide you with an easy link to cast your vote:

There’s no doubt that Cloudscaling would not be as great as it is without our customers and other Stacker friends.  That’s why we humbly ask you to please take the time to vote for these submissions which include user stories from companies who are using OpenStack to achieve agility.

Customer Use Cases:

How Lithium used Cloudscaling OCS to bring IT into the Modern Cloud era

This session will include a case study by Randy Bias, CEO Cloudscaling and Joe Sandoval, Lithium about the key benefits experienced by Lithium in their journey: increased agility, “open” architecture, application modernization, improved DevOps efficiency and a foundation for the future, just to name a few.

No Wait IT Keeps Developers Productive at Ubisoft

This session will include a case study by Randy Bias, CEO Cloudscaling and Marc Heckmann, Enterprise Cloud Architect, Ubisoft about the key benefits experienced by Ubisoft in their journey: agility with “control”, satisfying the LOB’s need for speed, increased IT efficiency, application modernization, and improved DevOps efficiency.

Service Provider Achieves Ultra-Agile Infrastructure using Cloudscaling OCS

This session will include a case study by Randy Bias, CEO Cloudscaling and Matt Kinney, about the key benefits experienced by Canadian Web hosting in their journey: increased agility, services that are cost-competitive with major public clouds like AWS and a slew of new, dynamic cloud applications that customers love.

Panels on OpenStack, Hybrid Cloud, and Other Business Cases:

The OpenStack Thunderdome

“…five highly opinionated Stackers will lock horns over the best way to bring about global dominance of OpenStack as the default cloud-building platform. “

Top Hybrid Cloud Myths Debunked

Four experts from diverse industries including public and private cloud, systems integration and cloud management will square off and separate reality from assumption around hybrid cloud myths and top trends.

Hybrid Cloud War Stories: Expecting the Unexpected

Whatever you didn’t expect to go wrong, does.  What you hoped for does not materialize.  Building a private or public cloud is hard enough, but putting them together is not for the faint of heart.  Four experts who have extensive experience in using, delivering, and architecting hybrid clouds will chime in on what to look for when going for the gold.

Compliance Slows Us Down While Cloud Speeds Us Up; Or Does It?

Compliance and governance give the appearance of slowing down IT, while Cloud gives us the hope of moving faster and faster.  Can managing down risk co-habitate with greater agility and flexibility?  We’ll ask that question and more.

Adopting Cloud? Unlearn everything you know about traditional enterprise architecture first.

Cloud is about a lot more than VMs-on-demand.  Traditional enterprise IT approaches are being disrupted by “web scale” techniques.  As the cloud changes everything, we need to understand how web scale ultimately dovetails with traditional enterprise requirements.  How can we interpret the lessons learned from the big guys for your every day enterprise?

Next-Gen Organizational Design – Growth hacking with “BusDevOps”

The silos between dev and ops are coming down, but is it possible to extend that thinking to the rest of the business?  Can we achieve the impossible?  Bringing together the best of dev, ops, business development, sales, and product functions, we’ll discuss how this might play out and why it could be important to the next generation of businesses.

Technology Oriented Presentations

Scale-Out Ceph: Rethinking How Distributed Storage is Deployed (Speakers: Randy Bias, Tushar Kalra) –

At Cloudscaling, we believe that unification isn’t the answer. Good solid tiered storage architecture is the answer. In this session, see how Ceph can shine as part of a considered storage strategy rather than as ‘the only answer’.

Cloud Operations Dashboard Demo: Cloudscaling OCS User Interface

Step inside and we’ll give you a tour of Cloudscaling’s Open Cloud System cloud operator GUI, API, and CLI tools. By the operator, for the operator. Power up now!

Tales From the Field: A Day in the Life of Cloud Operations

Cloudscaling has been supporting 24×7 production clouds since before OpenStack existed. In this session, we will discuss the kinds of problems folks run into in typical OpenStack deployments and pull out a couple of interesting incidents to perform a deep dive on.

Tempest Testing for Hybrid and Public Cloud Interoperability

In this presentation we will take a closer look at DefCore, it’s origins and intentions, how the initiative can be extended, how RefStack can be used as the basis not only for OpenStack interoperability testing but also public cloud interop testing, and finally give a demonstration of wrapping this all up into an actionable package.

OpenStack Design Guide Panel

Bring your real-world questions and be prepared to talk OpenStack architecture with a panel of experts from across multiple disciplines and companies. We’ll be drawing on real architecture and design problems taken from real-world experiences working with, and developing solutions, built on OpenStack.

Virtual Private Cloud (VPC) Powered by Cloudscaling OCS & OpenContrail

We’ll explore how the Cloudscaling VPC cloud solution leverages SDN technology which has been well-tested in the telecommunications and service provider industries to build overlay networks that scale – both within and across data centers. We’ll also take a closer look at the VPC API updates to the OpenStack EC2 API and how the development work done there is providing real fidelity with leading public cloud providers enabling true hybrid cloud solutions.

OpenStack Reference Architecture: Scaling to Infinity and Beyond

We’ll explore some of the basic principles of creating a reference architecture and discuss real-world examples that demonstrate why implementing a reference architecture allows scaling from one to thousands of racks with relative ease. We will also touch upon why your reference architecture choices can directly affect interoperability between OpenStack clouds and between OpenStack and major public clouds.

We hope that you can take the time to vote for all of our presentations and we certainly hope to see you in Paris!

Also make sure you check out the list of fantastic presentations at the Mirantis blog.

Posted in OpenStack | Leave a comment

Tarkan Maner Joins Cloudscaling Board of Directors

Posted on by Randy Bias

Today I am extremely pleased to welcome Tarkan Maner, CEO of Nexenta and previously CEO of Wyse technologies, to Cloudscaling’s board of directors. The full press release can be found here.

I met Tarkan during our search for a new CEO and found him to be a passionate entrepreneur with a strong desire to help Cloudscaling succeed. Tarkan and I share similar characteristics. We are both super high energy personalities. We are both intensely keen on cloud. However, we also come from two different worlds: sales/ marketing vs. technology. And that contrast makes all all the difference.

One thing I pride myself on is finding new members of the team who fill in existing capability gaps. Or those who bring completely new capabilities and skills to the table. Here, I think Tarkan’s track record speaks for itself. He has built and led high growth businesses, like Wyse, into the cloud future. I am very much looking forward to learning from Tarkan’s experience and wisdom and his desire to join our board, with so many others courting him is a massive vote of confidence in myself, my co-founder Adam Waters, our business, and the rest of the Cloudscaling team.

Great to have you onboard, Tarkan!



Posted in Company | Leave a comment

The 6 Requirements of Enterprise-grade OpenStack, Part 3

Posted on by Randy Bias

In part 1 and part 2 of this series I introduced the core ideas around defining the requirements and then discussed the first four.  Today we’ll discuss the final two requirements and tie it all together.

Onwards and upwards!

Requirement #5 – Scalable, Elastic, and Performant

Enterprise-grade has to mean something. In the past, enterprise-grade related to a certain quality of a system that made it highly reliable, scalable, and performant. More and more, enterprise-grade is beginning to mean “cloud-grade” or “web scale.” What I mean by that is that as the move to next generation applications happens and enterprises adopt a new IT model, we will see major changes in the requirements for delivering a high quality system.

The example I love to use is Hadoop. Hadoop comes with a reference architecture that says: use commodity servers, commodity disk drives, and NO RAID. When is the last time you saw an enterprise infrastructure solution with no data protection at the hardware layer? Of course, it doesn’t make sense to run Hadoop on high end blades attached to a fiber channel SAN, although I have seen it. Even Microsoft Exchange has begun recommending a move towards JBODs from RAID and depending on the application software layer to route around failure.

Let’s talk about these three requirements for enterprise-grade OpenStack.

Scalability & Performance

Scalability is the property of a system to continue to work as it increases in size and workload demands. Performance is the measurement of the throughput of an individual resource of the system rather than the system itself. Perhaps Werner Vogels, CTO of Amazon, said it best:

A service is said to be scalable if when we increase the resources in a system, it results in increased performance in a manner proportional to resources added.

In many ways, OpenStack itself is a highly scalable system. It was designed around a loosely-coupled, message passing architecture — something tried and true for mid to large scale, while also able to scale down to much smaller size deployments. The problem, unfortunately, lies in what design decisions you made when configuring and deploying OpenStack. Some of its default configurations and many of the vendor plugins and solutions have not been designed with scale in mind.[1] When you read the previous installment, you understood that having a reference architecture is critical to delivering hybrid cloud. Make certain you pick an enterprise-grade product with a reference architecture that cares about scale and performance while using well-vetted components and configuration choices.

A complete examination of the scale and performance issues that might arise with OpenStack is beyond the scope of this series; however, let’s tackle the number one issue that most people run into: network performance and scalability.

OpenStack Default Networking is a Bust

OpenStack Compute (Nova) has three built-in default networking models: flat, single_host, and multi_host. All three of these networking models are completely unacceptable for most enterprises. In addition to these default networking models, you have the option of deploying OpenStack Networking (Neutron), which has a highly pluggable architecture that supports a number of different vendor plugins both to manage physical devices and also network virtualization systems (so called Software-defined Networking or SDN).

A very brief explanation of the default networking models shortcomings is in order. I will keep it very simple, but am happy to follow up later with more details. The flat and multi_host networking model requires a single shared VLAN for all elastic (floating) IP addresses. This requires running spanning tree protocol (STP) across your switch fabric, a notoriously dangerous approach if you want high network uptime. I’ve asked the question at multiple conferences to a full room: “How many of you think STP is evil?” and have had nearly everyone in the room raise their hand.

Perhaps more importantly, both flat and multi_host networking models require you to route your public IP addresses from your network edge down to the hypervisor nodes. This is really not an acceptable approach in any modern enterprise. Here’s a diagram showing multi_host mode:

OpenStack multi_host mode

It’s probably also worth noting that if you want multi_host mode, you need to be able to load code on your hypervisor. That means if you like ESX or Hyper-V you are probably out of luck.

By contrast, single_host mode suffers from the singular sin of trying to make a single x86 server the core networking hub through which all traffic between VLANs and the Internet runs.[2] Basically, take your high performance switch fabric and throw it in the trash because your maximum bandwidth is whatever a Linux server can push. Again, this is not an acceptable or even credible approach to networking. To be fair though, OpenStack competitors also took a similar approach to this. Here’s a diagram on single_host mode:

OpenStack single_host mode

All of these approaches have fundamental scalability and performance issues. Which brings us to OpenStack Neutron.

Depending on OpenStack Neutron NOT for the Faint of Heart

As of September 2013, it seems like Neutron still had significant issues as seen in this critical posting from Scott Devoid of Argonne National Labs (ANL) to the OpenStack operators mailing list. As of this writing, OpenStack Neutron supports single_host and flat modes, but not multi_host. Apparently, we may see a replacement for multi_host in the Juno timeframe, although this capability has been missing for a while now.

That being said, Neutron has made a lot of progress and to be honest, many of the issues folks have reported are more likely to stem from poorly written and adapted plugins. What this means is that in order to deliver success with OpenStack Neutron you need a version of Neutron plus accompanying plugins that have been designed for scale and performance.[3] Plus, your cloud operating system vendor should have some proven deployments at scale and have really beaten the crap out of the networking using exhaustive testing frameworks.

I could say much more about the performance and scalability of an enterprise-grade, OpenStack-powered product; however, it should give you a starting point in pinning down your vendors to make sure they have addressed these and related issues.

Most important: regardless of your OpenStack vendor they must be able to provide a detailed, multi-rack reference network architecture.

Without a reference network architecture, your ability to scale past a single rack is purely based on hand-waving and assurances from your vendor that may or may not have any validity.


Infrastructure can’t ever be truly elastic, but its properties can enable elastic applications running on top of it. An elastic cloud is one that has been designed such that individual cost of resources such as VMs, block storage, and objects is as inexpensive as possible. This is directly related to Jevon’s Paradox, which states that as a technology progresses, the increase in efficiency leads towards an increase in the rate of consumption of that technology:

Jevon's Paradox

Simply put, as the relative cost of components in the system reduces, applications can not only consume more, which enables routing around failures, but also consume more for the purposes of scaling application needs up and down based on demand. In essence, you can make the pool larger and buy more capacity if the individual components and resources are as cheap as possible.

Major elastic public clouds like Google, Amazon, and Microsoft are providing these kinds of properties, and it’s what you need to provide inside your four walls to enable hybridization.

Enterprise-grade OpenStack will help lead you into the future by providing scalability and performance while supporting elastic applications. Beware the OpenStack-powered cloud operating system that wants you to use a fiber channel SAN and blade servers. Those days have passed, as we can see with Hadoop.

Requirement #6 – Cloud Support, Training & Services Model for Global Enablement

Chances are you are a global organization and are planning to deliver 24x7x365 next generation, cloud native applications on top of your private, public, and hybrid clouds. You want partners who can support you globally, who have international experience, and most of all who are comfortable with supporting 24x7x365 environments.

Train Your IT Administrators to be the New Cloud Administrators

IT administrators are in the process of transitioning into cloud administrators. This evolution will be a deep and lasting change inside the enterprise. Entirely new sets of skills need to be developed and other skills refreshed and realigned to the new cloud era. When evaluating your enterprise-grade OpenStack partner, you should be looking for one with significant capabilities in training, both on generic OpenStack and on their specific cloud operating system product. Most importantly, when evaluating a partner who can help you upgrade your team’s cloud skills, make certain they aren’t just going to show you how to develop on OpenStack or install OpenStack.

What you really need is operator-centric training that focuses on:

  • Typical OpenStack architectures and specific product architectures
  • Pros and cons of various architecture and plugin/driver choices
  • Scalability, interoperability and performance issues and options
  • Troubleshooting common “full stack” problems
  • Introduction to how your developers will use your cloud
  • Understanding the cloud-native application model

Cloud Support Model

No matter how good your IT team is, you will need a trusted support team to back you up — a team that can support your entire system end-to-end. Make certain you ask your Enterprise-grade OpenStack-powered cloud operating system vendor if their support team has supported high transaction 24×7 environments before. Be certain that they have so-called “full stack” support capabilities. Can they troubleshoot the Linux kernel, your hypervisor of choice, networking architecture and performance issues, and do they understand storage at a deep level? Clouds are integrated systems and compute, storage, and networking all touch each other in fundamental ways. Your vendor needs to know a lot more than how to configure or develop for OpenStack. They need to be cloud experts at all levels of the stack. Demand it.

Global Service Delivery

Delivering a cloud internationally is no small feat, whether large or small.[4] It requires more than just reach. It requires cultural sensitivity and the ability to understand the unique requirements that arise in particular geographies. For example, did you know that while most data centers are more concerned over power than space, in Japan it’s still usually just the opposite. Space winds up being one of the single largest premiums. This space requirement is unique to their particular environment.

Your cloud operating system vendor should have a track record of successful international delivery and a partner network that can assist in a particular location.


OpenStack is an amazing, scalable foundation for building a next generation elastic cloud, but it’s not perfect.  None of the open source solutions it competes with are perfect either.  Instead, each of these tools is really a cloud operating system “kernel” that can be used to deliver a more complete, vetted, Enterprise-grade cloud operating system (cloudOS).  You will need an experienced enterprise vendor to deliver your cloudOS of choice and whether it’s OpenStack or another similar project I hope you will keep these requirements in mind.

I hope you enjoyed this whirlwind tour through the 6 Requirements of Enterprise-grade OpenStack. As a reminder, we covered these SIX requirements:

  1. 99.999% Uptime APIs & Scalable Control Plane
  2. Robust Management & Security Models
  3. Open Architecture
  4. Hybrid Cloud Interoperability
  5. Scalable & Elastic Architecture
  6. Global Support & Services

As you are out there evaluating the right vendor to help with your OpenStack adoption process and the move towards hybrid cloud, make certain you find out how much, if any of these requirements they can meet.

For some related white papers, check out:

[1] It’s also fair to say that some times people are using the messaging systems in an inappropriate manner.  Some times, plain old UDP is still best for fire and forget high throughput systems, like logs.

[2] Before you cry foul, others such as Eucalyptus also went down this unfortunate path.  I believe Eucalyptus 4.0 finally fixes this.  It’s a common mistake for people without networking experience to go down this path.

[3] So far the only one we have tested extensively is OpenContrail, which shows great promise, but we have to get running in some larger deployments before we declare victory.

[4] We know.  Building the KT uCloud in 2010 and 2011 was a huge task for an early stage startup.

Posted in OpenStack | Leave a comment

The 6 Requirements of Enterprise-grade OpenStack, Part 2

Posted on by Randy Bias

In part 1 of this series earlier this week, I introduced The 6 Requirements of Enterprise-grade OpenStack.  Today, I’m going to dig into the next two major requirements: Open Architectures and Hybrid Cloud Interoperability.

Let’s get started.

Requirement #3 – Open Architectures & Reducing Vendor Lock-in

We already covered building a robust control plane and cloud management system. One of the attractions of OpenStack is removing vendor lock-in by moving to an open source platform. This is a key value proposition and deserves a complete dialog about what is realistic and what is not in an Enterprise-grade version of OpenStack.

“No Vendor Lock-in” is Snake Oil Salesmanship

Are you being promised that OpenStack provides “no lock-in?” No vendor lock-in is a platonic ideal – something that can be imagined as a perfect form, but never achieved. Any system always has some form of lock-in. For example, many of you probably use RedHat Enterprise Linux (RHEL), a completely 100% open source Linux operating system, as your default Linux within your business. Yet, RHEL is a form of lock-in. RHEL is a specific version of Linux designed for a specific goal. You are locked into their particular reference architecture, packaging systems, installers/bootloaders, etc., even though it is open source.

In fact, with many customers I have seen less of a fear about lock-in and more of a concern about “more lock-in.” For example, one customer, who will remain anonymous, was concerned about adopting our block storage component, even though it was 100% open source due to lock-in concerns. When probed, it became clear that what the customer wanted was to use their existing storage vendors (NetApp and Hitachi Data Systems) and did not want to have to train up their IT teams on a completely new storage offering. Here the lock-in concerns were predominantly about absorbing more lock-in rather than removing it entirely.

What is most important is assessing the risks your business can take. Moving to OpenStack, as with Linux before it, means that you are mitigating certain risks in terms of training your IT teams on the new stack and hedging your bets by being able to get multiple vendors in-house to support your open source software.

In other words, OpenStack can certainly reduce lock-in, but it won’t remove it. So, demand an open architecture, but expect an enterprise product.

Lock-in Happens, Particularly with Enterprise Products

I wish it didn’t, but lock-in does happen, as you can see from above. That means that rather than planning for no lock-in, start planning for what lock-in you are comfortable with. An Enterprise-grade version of OpenStack will provide a variety of options via an open architecture so you can hedge your bets. However, a true cloud operating system and enterprise product cannot ever provide an infinite variety of options. Why not? Because then the support model is not sustainable and that vendor goes out of business.  Not even the largest vendors can provide all options to all people.

If you want to build your own customized cloud operating system[1] built around OpenStack, go ahead, but that isn’t a product. That’s a customized professional services path.  Like those who rolled their own Linux distributions for a while, it leads to a path of chaos and kingdom-building that is bad for your business. Doing it yourself is also resource intensive. You’ll need 20-30 Python developers with a deep knowledge of infrastructure (compute, storage, and network) who can hack full time on your bastardized version of OpenStack.  A team that looks something like this:

Cloud OS Development Team Requirements

So, ultimately, you’re going to have to pick a vendor to bet on if you want enterprise-grade OpenStack-powered cloud solutions.

Requirement #4 – Hybrid Cloud Interoperability

Hybrid is the new cloud. Most customers we talk to acknowledge the reality of needing to provide their developers with the best tool for the job. Needs vary, requirements vary, concerns vary, compliance varies. Every enterprise is a bit unique. Some need to start on public cloud, but then move to private cloud over time. Some need to start on private, but slowly adopt public. Some start on both simultaneously.  RightScale’s recent State of the Cloud 2014 report has some great survey data backing this up:


Let’s talk about why your enterprise-grade OpenStack-powered cloud operating system vendor had better have a great hybrid cloud story.

A Hybrid-first Cloud Strategy

Every enterprise needs a hybrid-first cloud strategy. Meaning, hybrid cloud should be your first and primary requirement. Then, plan around hybrid with a single unified governance model that delivers the best of both world’s for your constituencies. Finally, plan on a process where you will triage your apps/needs and determine which cloud is right for the job.  The following diagram highlights this process, but your mileage may vary as criteria are different from business to business:

Triaging and Mapping Apps to the Right Cloud

Understanding Cloud Interoperability & It’s Role In Hybrid Cloud

I have been through quite a number of interoperability efforts, the most painful of which was IPSEC for VPNs. Interoperability between vendors is not free, usually takes a fairly serious effort, and ultimately is worth the pain. Unfortunately, interoperability is deeply misunderstood, particularly as it applies to public/private/hybrid cloud.

The challenge in hybrid cloud is about addressing the issues of application portability. If you want a combination of public and private clouds (hybrid) where an application can be deployed on either cloud, moved between the clouds, or cloudbursted from one cloud to another, then application portability is required. When you pick up and move an app and it’s cloud native automation framework[2] from one cloud to another, a number of key things need to remain the same:

  • Performance must be at relative parity
  • Underlying network, storage, and compute architectures must be the same or similar
  • Your automation framework must support API compatibility with both clouds
  • The TCO of running the app must be within ½-2x of each other
  • There must be behavioral compatibility, meaning non-API “features” are matched
  • You must support API compatibility with the relevant public clouds

Here is a slide I used in a recent webinar to help explain these requirements.

Hybrid Cloud Interoperability Requirements

Of course, you must also have been thoughtful when designing your application and avoided any lock-in features of a particular cloud system, such as a reliance on DynamoDB on AWS, HA/DRS on VMware, iRules on F5 load balancers, etc.

If you don’t meet these requirements, interoperability is not possible and application portability attempts will fail. The application performance will be dramatically different and one cloud will be favored; there will be missing features that cause the app to not function on one cloud or another; and your automation framework may fail if behavioral compatibility doesn’t exist. For example, perhaps it has timers in it that assume a VM comes up in 30 minutes, but on one of your clouds it takes 1-2 hours (I’ve seen this happen).

All of these issues must be addressed in order to achieve hybrid cloud interoperability.

OpenStack Needs a Reference Architecture

The Linux kernel needs a reference architecture. In fact, each major distribution of Linux in essence creates it’s own reference architecture and now we have distinct flavors of Linux OS. For example, there are the RedHat/Fedora/CentOS flavors and the Debian/Ubuntu flavors. These complete x86 operating systems have fully-baked reference architectures and anyone moving within one of the groups will find it relatively trivial to move between them. Whereas a RedHat admin moving to Debian may initially be lost until they come up to speed on the differences. OpenStack is no different.

OpenStack, and in fact most of its open source brethren, has no reference architecture. OpenStack is really the kernel for a cloud operating system. This is actually its strength and weakness. The same holds for Linux. You can get Cray Linux for a supercomputer and you can get Android for an embedded ARM device. Both are Linux, yet both have radically different architectures, making application portability impossible. OpenStack is similar, in that to date most OpenStack clouds are not interoperable, because each has its own reference architecture. Every cloud with its own reference architecture is doomed to be an OpenSnowFlake.

Enterprise-grade cloud operating systems powered by OpenStack must have commonly held reference architectures. That way you can be assured that every deployment is interoperable with every other deployment. These reference architectures have yet to arise. However, given that there is already a single reference architecture in Amazon Web Services (AWS) and Google Cloud Platform (GCP), (we call it “elastic cloud reference architecture”) and given that these two companies will be major forces in public cloud, it’s hard to see how OpenStack can avoid supporting at least one reference architecture that looks like the AWS/GCP model.

To be clear, however, there may be a number of winning reference architectures. I see emerging flavors in high performance computing (HPC) and possibly other verticals like oil & gas.

Enterprise-grade Reference Architecture

Ultimately, you have to place your own bet on where you think OpenStack lands, but existing data says that out of the top 10 public clouds, only a couple are based on OpenStack[3]:

RightScale State of the Cloud 2014 on Top Public Clouds Used by Businesses

If enterprises desire agility, flexibility, and choice, it seems obvious that OpenStack needs to support an enterprise-grade reference architecture that is focused on building hybrid clouds with the ultimate winners in public cloud. It’s just my opinion, but right now that looks like Amazon, Google, and Microsoft.

Enterprise-grade OpenStack means an enterprise-grade reference architecture that enables hybrid cloud interoperability and application portability.

Part 2 Summary

An open architecture designed for hybrid cloud interoperability is a foregone conclusion at this point.  Mostly what folks argue about is how that will be achieved, but for those of us who are pragmatists, it’s certain that public cloud will have a wide variety of winners and that the top 10 public clouds is already dominated by non-OpenStack contenders.  So plan accordingly.

Most importantly, remember to ask for an open architecture, while expecting an enterprise product.

In the next installment we’ll tackle what it means to deliver a performant, scalable, and elastic infrastructure designed for next gen apps.

Next Installment:

UPDATE: Added a clarifying footnote due to some Twitter feedback that seemed unclear on what a “cloud operating system” was and it’s relationship to OpenStack and similar open source projects.

[1] No, Eucalyptus, OpenNebula, CloudStack, <insert your cloud software du jour>, are NOT complete cloud operating systems.  They are all roughly at parity with OpenStack, although certainly you could argue that one is above or below the others.  Why aren’t they complete?  That’s a whole other blog posting series, but suffice it to say that when is the last time you saw an operating system that couldn’t install itself on bare metal?  Or didn’t provide system metrics and logging capabilities?  Or was missing key components (e.g. databases).  A cloud operating system is a non-trivial task and most of these tools have simply handled the easy part of a cloud: placing a VM on a hypervisor (big whoop).

[2] By definition any cloud native next gen app manages itself via an automation framework. It might be a low level approach like Chef, Puppet, SaltStack; it might be a higher order abstraction like Scalr, RightScale, Dell Cloud Manager; it might even be a PaaS framework, but it’s *something*. Or it’s not a cloud-native app.

[3] Be sure to read the caveats on the VMware vCHS data in the actual report itself.

Posted in OpenStack | Leave a comment

The 6 Requirements of Enterprise-grade OpenStack, Part 1

Posted on by Randy Bias


OpenStack is an amazing foundation for building an enterprise-grade private cloud. The great OpenStack promise is to be the cloud operating system kernel of a new generation. Unfortunately, OpenStack is not a complete cloud operating system, and while it might become one over time, it’s probably best to look at OpenStack as a kernel, not an OS. [1]

In order to become widely adopted by the enterprise, OpenStack must ultimately be delivered via robust, enterprise-grade products that close the gap on the key areas where OpenStack has challenges. These products are delivered today by businesses that can provide support, ease-of-installation, tools for day-to-day management, and all of the other pieces necessary for achieving acceptance. Without these vendors who have a stake in enterprise adoption, OpenStack can never be widely adopted. OpenStack isn’t MySQL. It’s the Linux kernel, and like the Linux kernel, you need a complete operating system to create success.

So what’s required? There are 6 key elements:

  1. 99.999% Uptime APIs & Scalable Control Plane
  2. Robust Management & Security Models
  3. Open Architecture
  4. Hybrid Cloud Interoperability
  5. Scalable & Elastic Architecture
  6. Global Support & Services

6 Requirements for Enterprise-grade OpenStack Supporting Material.002

If your business requires an enterprise-ready OpenStack solution, read on to learn more about what a true enterprise-grade OpenStack-powered private cloud can – and should – offer. Over the next two weeks, I’m going to do a multi-part blog series entitled “6 Requirements of Enterprise-grade OpenStack” – where I will detail these six requirements.

To get started, lets look at OpenStack’s place in the enterprise.

OpenStack in the Enterprise Data Center

Agility is the new watchword for cloud and DevOps is seen as the path to realizing agility. OpenStack provides the ideal platform for delivering a new developer experience inside the enterprise, just as Linux provided a new experience for web applications and Internet adoption. If OpenStack was just a “cheaper VMware,” then it would have little or no real value to the enterprise. Instead, OpenStack provides a shining example of how to build a private elastic cloud like major public clouds such as Amazon Web Services (AWS) and Google Cloud Platform (GCP). Just as Hadoop brought Google’s MapReduce (plus it’s reference architecture) to the masses, OpenStack brings the AWS/GCP-style Infrastructure-as-a-Service (IaaS) offering to everyone. This is what makes DevOps inside the enterprise ultimately shine.

Any discussion about DevOps, like so many of the recent buzzwords, can quickly become mired in semantic arguments. However, the one truism we can all agree on is that the traditional barriers between application developers and IT infrastructure operators need to be broken.

Time after time, I hear a similar story from our customers that goes like this: “We went to the infrastructure teams with our long list of requirements for our new application. They told us it would take 18 months and $10M before it would be ready. So we went to Amazon Web Services. We didn’t get our list of infrastructure requirements and we had to change our application model, but we got to market immediately.” That’s because the inherent value of AWS has less to do with cost and more to do with the on-demand, elastic and developer-centric delivery model.

OpenStack levels the playing field inside the enterprise. Private clouds can be built on the public cloud model, enabling developers while simultaneously giving centralized IT control and governance. In essence, it’s the best of both worlds, which is the true value of OpenStack-powered private clouds.

Why Does Agility Matter?

While I think it’s self-evident that agility is the driving light behind cloud computing, it’s worth a quick refresh. The need for businesses to move now has driven ridiculous growth for AWS (see growth below and notice this is a log chart):

Netcraft AWS Growth Relentless

This growth is all net new applications, or what Microsoft calls next generation applications. The vast majority of these new applications are focused on creating entirely new business value, typically around mobile, social, web applications, and big data. In fact, this category of application is growing so fast that analysts such as IDC and Gartner have started tracking it [2]:

Agility Driven by Next Gen Apps

At its current rate of growth, next generation cloud applications will equal the size of all existing applications by 2018:

Next generation applications are the source for future competitiveness for most enterprises, which has led them to accelerate their cloud adoption process and rethink their cloud strategy.

Observing this phenomenon is what led Forrester analyst Craig Le Clair to say:

Seventy percent of the companies that were on the Fortune 1000 list a mere 10 years ago have now vanished – unable to adapt to change …

We have now entered an adapt or die moment for enterprises, and OpenStack will be key to agility adaptation and the successful support of DevOps.

Over the next few weeks leading up to the OpenStack Summit I’m going to cover all 6 Requirements of Enterprise-grade OpenStack in detail.  Today I am going to handle the first two requirements: high uptime APIs and robust management of your cloud.

Requirement #1 – 99.999% Uptime Control Plane: High Uptime Apps Require a High Uptime API

Continuing our discussion around delivering enterprise-grade OpenStack, let’s discuss how critical API availability and scaling out the cloud control plane are to delivering next gen applications.

Cloud API Uptime & Availability

A critical capability for moving to a new cloud and devops model is the ability of cloud native applications to route around failures in an elastic cloud. These applications know that any given server, disk drive, or network device could fail at any time. They look for these failures and handle them in real-time. That’s how Amazon Web Services (AWS) and Google Cloud Platform (GCP) work and why they can run these services at a low cost structure and with greater flexibility. For an application to adapt in real-time to the normal failures of individual components, the cloud APIs must have higher than normal availability.

Your Cloud Control Plane’s Throughput

API uptime isn’t the only measurement of success. Your cloud’s control plane throughput is also critical. Think of the control plane as the command center of your cloud. It is most of the centralized[3] intelligence and orchestration layer. Your APIs are a subset of the control plane, which for OpenStack also includes all of core OpenStack projects, your day-to-day cloud management software (usually part of a vendor’s Enterprise-grade OpenStack distribution), and all of the ancillary services required such as databases, OpenStack vendor plugins, etc. Your cloud control plane needs to scale-out as your cloud grows bigger. That means that in aggregate, you have more total throughput for API operations (object push/pull, image upload/downloads, metadata updates, etc.).

This is where a proper cloud operating system comes in.

99.999% Uptime APIs and Scale-out Control Plane

In essence, by saying that you can build a four or five 9 app (99.99-99.999%) on a two and a half 9 infrastructure (99.5%), the API that app manages must also have a four or five 9 uptime (99.999%). As most of you know, delivering five 9s of availability is a non-trivial task, as this is only 5.26 minutes of unplanned downtime allowed per year. Typical high availability approaches, such as active/passive or master election systems, can take several minutes to failover, leaving your cloud API endpoints unavailable.

An enterprise-grade cloud operating system can provide guarantees of sub-minute or even sub-second failover and deliver 99.999% or possibly even 99.9999% (that’s six 9s or 31.5 seconds of downtime per year) uptime. This kind of design is achievable at a relatively low price point using classic load balancing style techniques where your control plane and APIs are running active/active/active/active/… to N where N is however many you need as your cloud grows:

Load Balancing vs. Simple HA

Which brings me to the second part of the equation: you need your control plane to grow as your cloud grows. You don’t want to re-architect your system as it grows, and you don’t want to resort to old school scale up techniques for your API endpoints. When you run active/passive or with a master election system for high availability, only one API endpoint is available at a time. That means that you are fundamentally bottlenecked by the total throughput of a single server, which is unacceptable in today’s scale-out cloud world.

Instead, use a load balancing pattern so you can run multi-master (N-way) active API endpoints, scale your control plane horizontally and simultaneously achieve a very high uptime. This is the best of all worlds, allowing your cloud native applications to route around problems in real-time.

Now let’s talk about day-to-day management of and securing your cloud.

Requirement #2 – Robust Management: Managing and Securing Your Cloud is Not Free

You probably know this already, but building a robust, manageable, and secure infrastructure in the enterprise isn’t easy. The notion that an enterprise-grade private cloud can be delivered in an afternoon and in production that evening doesn’t wash with the realities of the datacenter. Still, time is of the essence and if you want a cloud that doesn’t suck and you want it (relatively) fast, then it will help if the version of OpenStack you choose has been designed with deployment, daily management, and security in mind. Let’s take a deeper look at what that entails.

Robust Management

Installation is only the beginning when it comes to managing OpenStack. A true cloud operating system provides a suite of operator-centric cloud management tools designed to allow the infrastructure team to be successful at service delivery. These management tools provide:

  • Repeatable architectural model, preferably using pods or blocks wired together with a reference network architecture
  • Initial cloud installation & deployment
  • Typical day-to-day cloud operator tools for logging, system metrics, and correlation
  • Cloud operator command line interface (CLI) and API for integration and automation
  • Cloud operator GUI for visualization and analysis

Many attempts to solve the private cloud management challenges stop at installation. Installation is just the beginning of your journey, and how easy it is doesn’t matter if your cloud is then hard to manage on a daily basis. As we all know, running a production system is not easy. In fact, private clouds are significantly more complex than traditional infrastructure approaches in many aspects. To simplify the issue, at scale, the cloud pioneers, such as Google, Amazon, and Facebook have all adopted a pod, cluster, or block based approach to designing, deploying, and managing their cloud. Google has clusters; Facebook has triplets; but it’s all ultimately the same: a Lego brick-like repeatable approach to building your cloud and datacenter.[4] Enterprise-grade OpenStack-powered cloud operating systems will need to provide a similar approach to cloud organization.

Once the cloud is up and running, cloud operators need a variety of tools to maintain the cloud on a regular basis, including event logs and system metrics. Sure, in an elastic cloud events that used to be critical (e.g. server or switch failure) are no longer high priority. However, your cloud can’t be a black box. You need information on how it’s operating on a daily basis so you can troubleshoot specific issues as required and most importantly keep an eye out for recurring issues using correlation tools. An individual server failure might not be a problem, but any kind of common issue that is effective large amounts of resources needs to be sought out and quickly addressed.

What is your cloud doing? Not only do you need to know, but your other tools and groups may need to know as well. Integration to existing systems is critical to broad adoption. Any complete solution will have an API and command line interface (CLI) to allow you integrate and automate. A CLI and API for just OpenStack administrative needs is not enough. What about your physical server plant or management of your blocks or pods? How about being able to retrieve system metrics and logging data on demand from not only OpenStack, but Linux and other non-OpenStack applications? You need a single, unified interface for cloud operations and management. Obviously, if you have this API, a GUI should also be provided for those unique cloud operator tasks that require visualizations such as looking for patterns in system and network metrics.

Security Model

Cloud turns the security model on its head. A complete discussion of this topic is far beyond the scope of this blog, but I do know one thing: enterprises want a private cloud with an understandable security model, particularly for the control plane. As I covered in the previous installment of this series, your cloud control plane’s API uptime and throughput is critical to allowing next generation applications to route around trouble. Similarly, the security of your cloud’s control plane should not be taken for granted.

It can be easy to get caught up in the move towards a decentralized model, but decentralized and scale-out are not the same thing. You can actually mix centralization and scale-out techniques and this is the default model that cloud pioneer Google uses. Keeping your cloud control plane in one place allows you to:

  • Have a single go-to location for troubleshooting
  • Always know where your control plane is located rather than having to guess
  • Apply security policies/zones to your control plane
  • Keep your control plane data (the system of record) completely separated from data plane data

This last item is perhaps most important. You don’t want your OpenStack database to reside on the same storage system as your virtual machines. What if someone breaks into a VM through the hypervisor? Or, conversely, what happens if someone breaks into the control plane via an API?

Best practices in the enterprise have long comprised an approach of zoning (usually with VLANs) of different components into different security areas with differing policies applied. Zoning slows an attacker down, gives you time to detect them, and to respond. Being able to take a similar approach to your private cloud security model is vital to making certain your cloud is secure.

Cloud Management and Security

As I said, your journey begins with the installation of the cloud. After that, you need a set of tools and a security model that allows you to confidently manage your cloud day by day. An Enterprise-grade, OpenStack-powered cloud operating system should deliver as much of these capabilities as possible.

Part 1 Summary

OpenStack is a strong foundation for building a next generation private cloud designed for next generation cloud applications.  Unfortunately, it isn’t a complete cloud operating system and you will need a partner to provide you with that solution.  This series is covering The 6 Requirements of Enterprise-grade OpenStack and in today’s blog posting I covered the need for a high uptime, scale-out control plane and robust, secure management tools.

In the next installment I will cover building around an open architecture and reducing vendor lock-in.  That will be followed by the closing posting covering the need for filling in the gaps around scalability and performance and choosing a partner who can provide global services & support.

Second Installment:

Final Installment:

UPDATE: Added references for the growth of new net new apps.

[1] In fact, even the OpenStack Foundation is about to refresh it’s messaging to help clarify this.

[2] The original source for this is EMC, both at this blog posting and via a private presentation by Joe Tucci, CEO of EMC, at a Research Board event.

[3] Yes, centralized. Something can be centralized and still scale-out. Centralization is necessary for proper security policies and zones to be enforced.

[4] Google’s cluster architecture has quite a bit of detail here.

Posted in OpenStack | Leave a comment


← Older posts