Cloudscaling Presentation Roundup for 2011

After the NIST presentation I gave last week in Washington, D.C., there were a large number of requests for the presentation itself.  Rather than reply to all of the individual requests, I thought I would direct folks to the NIST presentation while also providing a quick roundup of key talks and presentations I gave over the past year.  Most, but not all, of my decks are on SlideShare.  I try to keep SlideShare updated so you could follow there and get updates as they happen.

Here is a list of key materials you should check out and direct links to those presentations in chronological order:

  • Cloud Can Increase Korea’s Global Competitiveness (SlideShare) (Video)
    • Cloud Frontiers 2011 (Dec’10), Seoul, Korea
  • Enterprise Cloud Myth(s) (SlideShare) (Video)
    • Cloud Connect 2011, Santa Clara, USA
  • Carrier Cloud Opportunity (SlideShare)
    • TMForum Management World (May’11), Dublin, Ireland
  • Clouds, Open-ness, and IT Patterns (SlideShare)
    • NIST Cloud Computing Forum and Workshop IV (Nov’11), Washington, D.C., USA

That’s a pretty good introduction to our thinking and evolution of thinking over the last year.  Some of the themes and data are carried throughout, but each of them covers slightly different ground.  The two I would recommend the most are Enterprise Cloud Myth(s) and the latest one at NIST.

Hope you enjoy.

Posted in Cloud Computing | View Comments

Want to Build the World’s Largest Clouds?

As we continue building the world’s largest clouds for service providers, we’re growing fast, and we’re looking for engineering and marketing talent. Experience with OpenStack Nova and Swift, Linux, Python, Chef, Puppet, AWS, and Layer 2 / Layer 3 networking are just a few of the things on our shopping list. You can see the full list of open positions at our Careers Page, but we’ve also listed them below:

  • Cloud Solutions Architect
  • Director of Marketing
  • Quality Engineering Lead – Cloud Development
  • Cloud Services Engineer
  • Cloud Engineer – Systems Automation
  • Cloud Engineer

Our engineering team is making big strides every week, so if you  want to work with smart people who are creating the future of IT, let’s talk.

Posted in Company | View Comments

Is Open Compute Ready for Prime Time?

I just returned from the Open Compute Project (OCP) Summit in NYC. It was an eye opening experience. I thought I would share my take aways plus talk about what I perceive as a core issue: can the Open Compute Project (OCP) grow beyond Facebook? By that, I mean there is a clear challenge right now. Most of the OCP Summit seemed to be Facebook telling vendors what they needed. That’s great for Facebook, but doesn’t advance the ball for the rest of us.

For those of you who may not have heard of the Open Compute Project, it’s an attempt to ‘open source’ hardware design and specifications, in much the way that software is open sourced.  Here is the OCP mission statement:

The Open Compute Project Foundation is a rapidly growing community of engineers around the world whose mission is to design and enable the delivery of the most efficient server, storage and data center hardware designs for scalable computing. We believe that openly sharing ideas, specifications and other intellectual property is the key to maximizing innovation and reducing operational complexity in the scalable computing space. The Open Compute Project Foundation provides a structure in which individuals and organizations can share their intellectual property with Open Compute Projects.

Facebook hopes to gain by seeding a community and providing their own designs such that others will provide their own to that community in an open manner. In this way they will be able to learn and grow from the greater community over time. No one has ever really tried this before so it’s a big question mark if they will succeed.

My money is on the success of OCP and Cloudscaling has designs we hope to push back. But will Facebook accept designs of others? What if they aren’t ‘efficient’ enough or ‘cutting edge’ enough? Much of what Facebook has seeded the OCP with is impractical in today’s datacenters. For OCP to thrive it’s going to have to spread it’s arms wide while maintaining the high level of quality it appears to expect.

Let’s get into it.

OCP Summit Take Aways
The OCP Summit seemed to be a much better organized event than some of the other early open source events I attended. Most likely this is the effect of having a single primary driver in Facebook. There were a number of keynotes in the morning following by break out sessions in the afternoon. I am sure these are covered in other blogs, but what you should know is that the keynotes were composed of outstanding speakers from Facebook, AWS, RedHat, and Dell, while the breakout sessions appeared to be all, or mostly, Facebook employees.

I had some observations while seeing the various presentations and Q&A sessions:

  • Facebook seems to have a very FB-centric world view, which is to be expected; e.g. thinking on virtualization was completely missing and while it’s unlikely Facebook uses much virtualization in their system, any IaaS provider will find the current OCP designs seriously lacking
  • OCP, in general, wants to set a high bar of professionalism and rigor; they want participants who participate in action, not just name; “we aren’t a marketing umbrella”
  • Pragmatic thinking about the millions and millions of unused or underused square feet of already existing datacenter space was completely missing; how to retrofit, or how to get ‘part of the way’ towards the current OCP designs was missing
  • Some of the designs being discussed are useful in the 3-5 or even 10 year time frame, but not now
  • OCP is providing very clear guidelines on how to participate and incubate new projects, which is awesome
  • There is some danger this community will be only Facebook and hardware vendors who want to sell to them

One thing that really struck me was how on one side you would see folks panning blade servers, but then advocating some very advanced and specialized designs like Open Rack, which are another kind of specialized design. Certainly the Open Rack project is not blade servers, but if there is one thing I feel like we should have learned from ATM, FDDI, Token Ring, blade servers, and a host of other specialized technologies it’s that ‘good enough’ + ‘cheap’ usually wins.

Now, the counter argument from folks would be that OCP is attempting to get the vendors to follow their lead and manufacture at scale such that the cost of these specialized solutions is competitive with standard 1U/2U rackmount servers; however, this hasn’t worked before, why would it now? This is a key chicken and egg problem and part of why rackmount servers are still the standard Field Replaceable Unit (FRU) for most large scale datacenters.

This is not to say I think the Open Rack design is bad, it’s just very forward thinking and I don’t think it has a lot of practical application any time soon.

Remember, 57% of large scale datacenter costs are servers, so paying *any* premium above what a rackmount server costs can’t be justified easily.

Practical Compute
Given the above, I’d like to call for those involved with OCP to start thinking about not just ‘open compute’, but ‘practical compute’. We have customers now who have many thousands of square feet of datacenter space that is a sunk cost. These facilities are largely paid for and can be retrofitted to some degree, but there are limits. DC space is 4-5% of overall costs, which is not insignificant, especially if that facility is already paid for.

What’s more important? Designing a new 1.07 PUE datacenter for 100M or retrofitting 3-5 existing smaller DCs and getting them down from 2-3 PUE to 1.25-1.5 PUE for closer to 10M per DC [1]?  To really bring this home, during Jame’s Hamilton’s keynote at OCP Summit, he specifically said that hot aisle containment could drop a 3 PUE facility to 2.5 or even 2 PUE.

My view is that over the 5-10 year time frame retrofitting existing empty DCs to have much better power, cooling, and more efficient servers is a more practical approach.  There are a lot of customers who fit this bill. Unfortunately, the current OCP designs assume that everyone will build new DCs that are designed to wring the last oodle of efficiency out of every step of the process.

In some ways, this reminds me of how people approach ‘availability’ or ‘security’. There is almost an infinite amount of resource you can dump into either bucket, but you quickly reach a point of diminishing returns. Smart business people look very carefully at the risk/reward tradeoffs of availability and security and find a ‘sweet spot’ for those diminishing returns.

For Facebook that point is very different from many of the others who can benefit from the Open Compute Project (OCP).

OCP’s Future is Bright … I Hope
I believe OCP’s future is bright. As I said, we’re working on some designs we hope to push back. I am concerned that the focus on Facebook’s needs, the high bar that is being set, and some of the focus on forward thinking designs may blind the OCP community to the need for pragmatic designs for today’s datacenters. On the other hand, it is important for someone to hold us all to a higher standard when it comes to energy efficiency. We all want to be good citizens. Can OCP strike the right balance and become more broadly meaningful? I believe it can if it keeps in mind that retrofitting existing datacenters is an important short to medium term goal that needs attention.

Big props to Facebook for continuing to open up OCP’s processes and providing guidance on how to participate. Here I think is the seed for how OCP can be wildly successful. We need more constituents with clear pragmatic designs that are based on real world usage and use cases, not overly futuristic thinking.

My message to the greater OCP community is that this is a great start, but the sooner there is a broad base of customers who are not Facebook within the Open Compute Project, the better. Let’s get some real practical computing designs out there for today’s datacenters. OCP could be a tremendous vehicle for change with more active participants with real world problems.


[1] Yes, I’m making these numbers up. It’s more of framing how to evaluate which approach to take rather than any particular facts or data I have. What’s important is that it’s a business evaluation, not an assumption that we need yet more datacenter space.

Posted in Cloud Computing, Uncategorized | View Comments

Webinar: Web-Scale Cloud Building with Arista Networks and Quanta

We spend a lot of time on stage at conferences talking about building web-scale clouds. This Friday, we’re partnering with Quanta and Arista to deliver a webinar for engineers and anyone else wanting to dig deeper into the details of architecture, equipment selection, and other technical issues that are sometimes ignored in similar presentations. We’ll also spend some time talking about successful cloud business models based on these design approaches. Here’s the info:

Friday, 21 October 2011
10:00 am PDT
Duration: 1 hour
Register here

Enterprise technology providers have been positioning their solutions for public and private clouds with meager results, it is fundamentally impossible to build profitable cloud services when paying the enterprise computing, storage, and virtualization taxes to companies whose business model does not support the operation of profitable cloud services. Arista has partnered to deliver several successful clouds to market – this session will go in depth into the architectures and equipment selected, how to cost effectively scale the operation, mistakes to avoid, and discuss successful business models for transformation into a cloud provider.

Spread the word to anyone that wants to learn about building web-scale clouds from folks who’ve done it.

November 1, 2011 UPDATE: Below is a replay of the webinar.

Posted in Cloud Computing, Technology | Tagged , , , , , , | View Comments

Carriers Catching on to Commodity Cloud: David Bernstein Talks With Ian Scales

Cloudscaling’s David Bernstein spent some time earlier this week with Ian Scales of Telecom TV while in London to speak at an IEEE event. In the short segment, David and Ian explore the five key points that carriers and large service providers are beginning to figure when it comes to their cloud strategies:
  1. Commodity Cloud is Winning. A growing list of carriers are beginning to realize that building with commodity-based hardware architectures is the only way to build large systems that are fault tolerant and cost efficient enough to be competitive with their non-carrier competitors.
  2. Simplicity Scales. Large, fast, simple systems produce reliable and cost-effective platforms. Complexity does not.
  3. Open Systems are Winning. The timing of Oracle’s public cloud launch compared to the sellout crowd in Boston for the OpenStack Summit is a perfect contrast that illustrates this. Proprietary systems are expensive and offer a questionable value proposition. Open source offers short-term risk to be sure, but a more promising future with limited lock-in and licensing overhead.
  4. Building at Web Scale Requires New Thinking. In large systems, hardware is going to fail, regardless of how expensive it is. What makes a big cloud reliable and scalable is software, not hardware. Carriers are realizing that they can leave behind their old, expensive, legacy infrastructure when they build web-scale commodity clouds. They get a more competitive cloud with lower capex that’s easier to operate and has less operational baggage.
  5. Carriers Have Big Advantages Over Google and AWS. Carriers have nearly limitless, cheap bandwidth. They have deep expertise in network architecture and operations. They own the wired and wireless broadband networks And, they can easily connect mobile and tablet apps to OSS/BSS systems that can help developers get paid and manage customer relationships.

Check it out. Tell us know what you think in the comments below.
Posted in Cloud Computing | Tagged , , , , , , , , | View Comments