vCloud Director Architecture Primer

By | January 31, 2014

I do some consulting for a company who’s legacy business is in decline, and they are working to transition towards new Internet-based services. Good plan.  So they’ve hired a whole army of new developers to write these new Internet-facing applications and services.

And these developers are looking for virtual machines…  Lots of virtual machines.

They want to spin up new VMs all day long, install application run-times, write some code, tear it down and start again, until they’ve decided what run-times they will finally use.  Tomcat?  JBoss?  PHP?  MySQL?  PostgreSQL?  IIS?  MSSQL?  Then, once they’ve decided, they’ll want to spin up VMs for dev, test, UAT, production, A/B testing, and all of that for each application.

OK fine, they’ve got a process.  For each VM you want, fill out a web form and one of their engineers will see the request and create the VM for you.  You’ll have your VM in a day or two, depending on how busy they are.

“What?  No, no, we need lots of VMs and we need them now!”.  Sound familiar?  Well, they do have a point.  Until the developers have built the new products, the company’s revenues are going to continue to decline.  They really DO need them now.

Now the company isn’t new to Internet-facing apps.  They’ve already got lots of them.  And they are all multi-tier, with a VLAN for every tier and firewalls in between.  The result is that they’ve burned through nearly 4000 VLANs and with all these new apps, they’re likely to run out.  Sure we could stand up another network switching infrastructure and connected it to their existing switches via a router.  That would give us 4095 new VLANs to burn, but that seriously alters the beautiful network design that they currently have.  The network gurus will not like the idea of bolting a side car on their Ferrari.

So in summary, we’ve got a VM provisioning process that’s too slow, and a network that isn’t scalable enough.

Enter the Cloud

Sounds like a case for private cloud.  They’re a VMware shop, so let’s have a look at vCloud Director.

vCloud Director is VMware’s Infrastructure as a Service (IaaS) private cloud software.  What that means is, it will give you self-service deployment of virtual machines, networks and storage.  OK, for you cloud evangelists out there, I don’t want to start and argument, nor do I want to start a conversation about what cloud computing is. However, I will spell out the standard tenants of cloud and mention how vCloud Director addresses them.  They are:

  • Measured Service
  • Rapid Elasticity
  • Resource Pooling
  • On-Demand Self-Service
  • Broad Network Access

In my opinion, some of these are not very applicable to a private (on-premise) cloud.  Broad network access for example, is a function more of the web applications deployed on the cloud being useful on many types of devices (e.g. mobile devices).  Resource pooling is something they already have with their virtualization platform.  Elasticity is the perception of infinite resources, and the ability to quickly expand and contract the resources we’re using. Yes the users can do that in a private cloud, but we will have limited capacity over all.

So the tenants that are very applicable are On-Demand Self-Service and Measured Service.  Self-Service is the one that will help us deliver a very fast provisioning process.  Measured service will enable us to charge (or show) the user for the resources they consume.  This will (hopefully) incentivize the users to terminate VMs they’re no longer using, and not to over-build them with too much memory, disk space, etc.  The goal is to get them to spin up when they need to, and spin down when their done.  That’s elasticity.  The alternative is that your capacity fills up and stays full, like it is today, right?  That’s what measured service really does for you in a private cloud.

vCloud Architecture

OK, enough talk.  The basic components of vCloud Director are a web portal and an orchestration engine.  The use requests VMs via the web portal, and the orchestration engine reaches out to vSphere to create those VMs.  The web portal and orchestration engine run on what’s known as a cell server.  This is the core of vCloud Director.  A cell server runs on Redhat Linux.  For high-availability, you can have two (or more) cell servers behind a load balancer.

The cell server uses a database (as most VMware products do), so you probably already have a SQL server that hosts various VMware-related databases.  You can deploy the vCloud databases there, or build another SQL instance, your call.

Then there’s a virtual BSD-based appliance called vCNS manager.  This will manage virtual NAT devices that will connect our virtual networks.  And finally, there’s vCenter Chargeback manager (running on Windows) that will collect usage statistics and assign costs to the resources.  That’s our vCloud Director management layer.  In summary:

  • One or more cell servers
  • A SQL instance
  • vCNS Manager Appliance
  • Chargeback Manager

Note that you’ll need o provide the RHEL licenses for the cell servers and the Windows licenses for the SQL and Chargeback servers.

Now we deploy these VMs on our standard vSphere platform.  The management VMs are themselves NOT managed by vCloud Director so they shouldn’t live on the vCloud-managed platform.  We’ll build a new vSphere cluster (and a new vCenter instance) to host cloud-deployed VMs.  This new vCenter instance will also live on our standard vSphere platform.

The new vSphere cluster (we’ll call it a resource cluster) will represent a new “virtual datacenter” that we, the “provider” own, a Provider vDC.  It represents a bunch of capacity (CPU’s and memory, and storage that is attached tto the cluster).  The resulting architecture looks like this:

Reference Architecture

You can build additional Provider vDC’s, in different sites, or perhaps with different up-time SLA’s, different performance, to give you service offerings at various costs and deployment locations.

Next, we create what’s known as an Organization, or Org for short.  An Org is an administrative object that contains users, usually belonging to the same group, business unit, or customer.  An Org is typically tied to Active Directory, so that membership in Active Directory groups controls access to the Org.

Once Orgs are created, you then create Org vDC’s to allocate to the Org.  An Org vDC is a subset of the resources contained in the Provider vDC.  The Org vDC is essentially a resource pool within the PvDC’s cluster.  When you define an Org vDC, you specify how much resources to allocate and how much of those resource to reserve during times of resource contention.

In a simple setup, you might create one Org vDC per Org, but you might want more than one.  gain, you might want to create an Org vDC at the other site (in the other PvDC), or you might want to create multiple Org vDC’s with different allocation models.  There are three allocation models:  allocation pool, reservation pool, and pay-as-you-go.  The difference is essentially the amount of resources reserved.  The allocation pool allocated a fixed amount of resources and reserves some percentage of it.  The reservation pool reserves all of the allocated resources, and the pay-as-you-go pool reserves resources on a per-VM basis only.

Anyway, we (the provider) create two Orgs each with its own Org vDC as shown below:

Org Layout

As far as networking is concerned, our resource cluster is similar to any standard vSphere cluster.  We’ve got various networks connected to the hosts (via multiple NICs or VLAN trunks).  These networks might include a management network, a vMotion network, and one or more networks for VM traffic.

The VM networks will be known as Provider networks in vCloud Director.  They will be associated with IP subnets on your corporate network.  Orgs will have their own networks, known as Org networks, that connect to the provider networks.

Org networks can be either directly attached (bridged) to provider networks or connected via a NAT firewall, known as a vCNS Edge Gateway.  The gateway will serve two purposes.  It will protect the VMs behind it, and it will limit the usage of corporate IP addresses and VLANs.

An Org can have multiple Org networks, perhaps some direct-attached and some NAT’d.  In the figure below, we show an Org vDC that has one of each:


Finally, we come to the VMs themselves but we first need to introduce vApps.  A vApp is the unit of deployment in vCloud Director.  a vApp is a container that can contain one or more VMs, has its own network, and other attributes like lease duration and ownership.  When a user wants to deploy a VM, they deploy a vApp, with one or more VM’s in it. Again, the vApp network can be direct-attached to its Org network or connected via a NAT gateway.  Resulting in deployments that look like this:

vApp Networking

As you can see in the diagram, the deployment on the right is ultimately direct attached to the provider network, while the one on the right traverses two gateways before it reaches the provider network. You can mix and match as needed. The idea hear is that the Org can be protected from the provider network (and from other Orgs) if desired, and vApps themselves can be protected from other vApps if desired. North-south firewalls and east-west firewalls, if you will.

Note that, and I’m sorry this isn’t clear in the diagram, that the two VM’s shown on the NAT’d network may well be running on separate hosts in the vSphere cluster.  So how do the two VM’s on that network communicate?  Well, we need to provide a transport VLAN that vCloud Director will use to transport traffic for isolated networks through the physical network.  This transport VLAN is used by the vCNS edge gateways, and the gateways use one of two overlay technologies to tunnel the traffic over this common transport VLAN.

vApp Networking Transport
The two technologies are VCNI (vCloud Network Isolation) and VXLAN (Virtual Extensible LAN).  VCNI uses a VMware proprietary mac-in-mac encapsulation technique to tunnel the traffic between hosts over a layer-2 VLAN.  VXLAN encapsulates the traffic into multicast traffic that can be carried over layer-2 or layer-3.  VXLAN is far more scalable than VCNI, supporting millions of networks compared to only 1000 for VCNI.  VXLAN also has the potential to inter-operate with physical network gear that supports VXLAN.  In either case, the MTU size will need to be increased on the upstream switches to support the large packet size resulting from the encapsulation, and if you use VXLAN, your network team may also have to do some multicast configuration.

vApp Workflow

Like I said earlier, the vApp is the unit of deployment in vCloud Director.  The user selects a vApp from their Organization’s catalog.  So the next step is to create some vApps for the catalog.  This process is similar to creating a VM template in vSphere.  An Administrator, or an Org user that has been grated enough rights, uploads an ISO image of an Operating System, builds a new VM, preps it for generalization, packages it in a vApp and uploads it into the public catalog.

vApp Creation

Next, an Org user logs on, and copies the new vApp into their Org’s catalog (this only has to happen once).

vApp Org Copy

Then finally they, and other users, can deploy the vApp as many times as they need, within their allocated capacity.

vApp Deployment

When a user deploys a vApp, they must fill in some information, such as the computer names to assign to the VM’s, which Org network to connect it to, and some sizing parameters (number of vCPU’s, RAM, disk space, etc.). These questions are determined at vApp creation time.  After the questions are answered, the vApp deployment is fully automated, and the user should have their new VMs up and running in a matter of minutes.

That’s the basics of vCloud Director.  If you’re familiar with vSphere, it’s not much more complicated than that, and it gives you the self-service provisioning and network scalability that addresses the problems I mentioned in the beginning of the post.

Now you developers, get to work!


Leave a Reply

Your email address will not be published. Required fields are marked *