Infrastructure-as-a-Service Builder's Guide v1.0

Just in time for the New Year, we’re releasing a short 12 page whitepaper on building Infrastructure-as-a-Service (IaaS) clouds.  This whitepaper is targeted at folks building public or private clouds who want to understand our general take on clouds, cloud computing, and Infrastructure-as-a-Service.  In particular, we highlight some of the important areas to think about when you are planning and designing your infrastructure cloud.

Of course, we welcome comments and feedback.  They will be incorporated into future revisions.  The paper itself does go into some technical depth in a few areas, but we can provide quite a bit more color in our workshops.

For your reading pleasure, I present our first big technical whitepaper:

Thanks!

The Cloudscaling Team

Ps. We realize the definition of ‘workload’ or ‘cloud workload’ is not as crisp as it could be and request your feedback and thinking on better nomenclature or definitions.  Credit will be given as appropriate.


  • Twitter
  • LinkedIn
  • FriendFeed
  • del.icio.us
  • email
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
This entry was posted in Cloud Computing and tagged , , , , , , , . Bookmark the permalink.
  • http://topsy.com/tb/is.gd/5rpRm Tweets that mention Infrastructure-as-a-Service Builder’s Guide v1.0 | Cloudscaling — Topsy.com

    [...] This post was mentioned on Twitter by Randy Bias and Randy Bias, Cloudscaling. Cloudscaling said: Infrastructure-as-a-Service Builder's Guide: http://cloudscaling.com/blog/cloud-computing/infrastructure-as-a-service-builders-guide-v1-0 [...]

  • http://www.ubervu.com/conversations/cloudscaling.com/blog/cloud-computing/infrastructure-as-a-service-builders-guide-v1-0 uberVU – social comments

    Social comments and analytics for this post…

    This post was mentioned on Twitter by Cloudscaling: Infrastructure-as-a-Service Builder’s Guide: http://cloudscaling.com/blog/cloud-computing/infrastructure-as-a-service-builders-guide-v1-0...

  • http://johngannonblog.com/ John Gannon

    Randy — this is a nice primer. Question: At what size (measured in VMs, hosts, apps, or whatever metric you like) are today's CCS's likely to sweat and what are the factors that cause a CCS to hit scalability limits? Put another way, what 'resource' in the cloud management infrastructure (be it technical, people, or process) is likely to be the bottleneck as you grow your IaaS cloud?

  • http://cloudscaling.com randybias

    John, apologies for the delay. It depends on the CCS. It looks like the low end for a pod is about 50 physical servers running ~30 VMs each and the high end is probably more like 500 physical servers running ~30 VMs each. A CCS could conceivably manage a fairly large number of pods without too much trouble. I expect in the thousands. Any CCS that is designed in a loosely-coupled fashion should be able to be horizontally scaled using regular techniques. At the end most of them are simple batch processing systems.

    I don't know that there is a single resource constraints in scaling an IaaS cloud. The biggest issue is more one of scaling factor. As the margins get thinner, the ability to manage 10, 100, or 1,000 servers per operator will be crucial, but also reach a point of diminishing returns. The cost of a single operator spread across 10 vs. 100 servers is big, but between 1,000 and 10,000 servers is pretty marginal.

    Or to put it slightly differently, IaaS providers need to optimize their cost structures and that will be the primary source of any 'bottleneck' in that it will directly impact scalability. But over optimization is dangerous. At some point those resources are better spent on sales & marketing.

  • http://debaer.org/blog jdebaer

    Good read. Just to confirm some of the definitions you guys use :

    - a “pod” is basically any arbitrary grouping of VM hosts. Maybe it's based on physical infrastructure boundaries (the VLAN example), maybe it's based on end customer identity (all VMs of customer A have to go on that pod), maybe it's based on workload type (all Apaches have to go there).

    - an “availability zone” is a collection of one or more pods, where you have protection against individual VM host crashes (H/A), but not against “disasters” in the sense of traditional DR. In case of such a disaster you better have a replica in another zone.

    Are these definitions in-line with your thinking ? If so – would you agree then that in most case an availability zone will map onto one physical data center ?

  • http://cloudscaling.com randybias

    Yes. That's correct. Mostly. A pod isn't an arbitrary grouping though. It's a grouping based on scale, which is related to architecture decisions made in designing the pod. Google's pods for their infrastructure are 10,000 servers, because they rely on all of the servers in a given pod being on the same switch. (they build their own custom switches for this purpose). It's both a design decision and a scaling constraint.

    VMware pods will almost certainly be designed around Virtual Center, which has a stated limit of 256 ESX hosts, but most folks I've talked to say realistically it's 50. I've also heard inklings that if you use DRS this number is much, much, smaller. So if you decide that DRS is a requirement for a VMware-based IaaS offering, then your pod size might be only 30 ESX hosts (or less).

    Another scaling constraint (business, not technical this time) is capex. You might have a design that allows for 1,000+ nodes, but design a pod at a smaller size initially due to the realities of how much you can build out at once.

    I would say that when well designed an availability zone == a datacenter, but I'm not certain that is always the case. It's fairly likely that over time folks will have more than one availability zone within a single datacenter, assuming each avail zone is isolated in power, network, and cooling.

    The primary idea here is that availability zone is cribbed directly from Amazon's usage: facility infrastructure is guaranteed to be redundant, but not the facility itself. For a redundant facility you would need to be A) in a different building and B) have that additional building far enough away to be unaffected by acts of god. That range varies, but my personal number is 250 miles.

  • http://blog.opennebula.org/?p=282 blog.opennebula.org » Archives » A Flexible and Interoperable Cloud Operating System

    [...] computing is about integration, one solution does not fit all. Moreover, as pointed out in the CloudScaling “Infrastructure-as-a-Service Builder’s Guide“, the right configuration and components in a Cloud architecture also depend on the execution [...]

  • http://www.scavengerhuntanywhere.com/ Teambuilding2009

    We welcome comments and feedback.They will be incorporated into future revisions.The paper itself does go into some technical depth in a few areas which result into better position in it.

  • http://www.facebook.com/people/Kim-Won/100000973366143 Kim Won

    Hi,
    Saltmarch Media is organizing its third season of Business Technology Summit 2010 which is going to take place on 11 and 12 Nov'10 in Bangalore. The summit feature topics like Soa, Cloud Computing, Cloud Development, Cloud Service and more. For details log on to http://www.btsummit.com

  • seema

    To teach vim using the author’s method, one could start by removing all but the most basic commands. Then, these missing commands could be introduced to the user one at a time, in a controlled environment where there is a clear task (eg. jump the cursor to a particular point in the text) that can be measured as success or failure.

  • http://www.quora.com/Is-there-a-technical-whitepaper-available-for-Cloudscale-Where#ans441186 Quora

    Is there a technical whitepaper available for Cloudscale? Where?…

    cloudscaling.com has an good guide to start of with Infrastructure-as-a-Service Builder’s Guide v1.0 http://cloudscaling.com/blog/cloud-computing/infrastructure-as-a-service-builders-guide-v1-0...

blog comments powered by Disqus