OpenStack High Availability – Booting from a Ceph Volume

By | May 15, 2014

In this article, we’ll modify our OpenStack Icehouse compute node to be able to use Ceph storage for instances. That means we’ll be able to boot an instance directly off of a volume stored on our Ceph cluster. If you’ve been following along with the entire series, we’ve already built our high availability stack via the articles below:

Now we will modify our first compute node to include access to our Ceph cluster. We built this compute node in the article: OpenStack High Availability – First Compute Node. The node is called:

  • icehouse-compute1 (192.168.1.45)

and in the article: OpenStack High Availability – Ceph Storage for Cinder and Glance we built our Ceph cluster. The cluster nodes are:

  • ceph1 (192.168.1.39)
  • ceph2 (192.168.1.40)
  • ceph3 (192.168.1.41)

To setup our compute node to access Ceph, we’ll need to get a few things from the Ceph cluster, so ssh over to ceph1 for these steps. We’ll push the ceph.conf and the access key for our Ceph user over to the compute node:

ssh 192.168.1.45 "sudo mkdir -p /etc/ceph; sudo tee /etc/ceph/ceph.conf" < /etc/ceph/ceph.conf
ceph auth get-key client.icehouse | ssh 192.168.1.45 tee client.icehouse.key
ceph auth get-or-create client.icehouse | ssh 192.168.1.45 sudo tee /etc/ceph/ceph.client.icehouse.keyring

Now let’s go back to the compute node and install the Ceph packages:

apt-get install ceph-common python-ceph

Now comes the brain teaser. In order for the compute node to boot an instance from a Ceph volume, we need to store the Ceph access key as a libvirt secret object. Libvirt will then use the Ceph key stored in the secret object whenever it performs Ceph access. To create the secret, we build an XML file. To guarantee that the secret is unique, we’ll generate a new uuid to use for the secret. We’ll then ues that uuid in the XML file and create the secret, and insert our Ceph key. Above, we pushed a copy of that key to our home directory on the compute node, so make sure you’re in the home directory for this step:

on the first compute node only:

cd ~
UUID=$(uuidgen)
cat > secret.xml <<EOF
<secret ephemeral='no' private='no'>
  <uuid>$UUID</uuid>
  <usage type='ceph'>
    <name>client.icehouse secret</name>
  </usage>
</secret>
EOF
sudo virsh secret-define --file secret.xml
sudo virsh secret-set-value --secret $UUID --base64 $(cat client.icehouse.key) echo $UUID

Notice the uuid echoed at the bottom of the output of the last command. Grab that as we’ll need that in the next step, as well as for additional compute nodes. One other compute nodes, we’ll modify the step above to use the same uuid:

on additional compute nodes only:

cd ~
UUID=[uuid from last command]
cat > secret.xml <<EOF
<secret ephemeral='no' private='no'>
  <uuid>$UUID</uuid>
  <usage type='ceph'>
    <name>client.icehouse secret</name>
  </usage>
</secret>
EOF
sudo virsh secret-define --file secret.xml
sudo virsh secret-set-value --secret $UUID --base64 $(cat client.icehouse.key)
echo $UUID

Now we’ll configure Nova to use Ceph. We’ll edit the /etc/nova/nova.conf on the compute node and add the following lines to the [DEFAULT] section:

/etc/nova/nova.conf

[DEFAULT]
...
libvirt_images_type=rbd
libvirt_images_rbd_pool=datastore
libvirt_images_rbd_ceph_conf=/etc/ceph/ceph.conf
rbd_user=icehouse
rbd_secret_uuid=[uuid from the last command]
libvirt_inject_password=false
libvirt_inject_key=false
libvirt_inject_partition=-2
...

and restart the nova-compute service:

service nova-compute restart

With that all done, we can test. The process here is that we will create a Cinder volume from a Glance image, then create a Nova instance that boots from that Cinder volume. The Glance image should be in raw format, so lets head over to our controller node (192.168.1.35), get the good old cirros image and check it’s format:

# wget http://download.cirros-cloud.net/0.3.1/cirros-0.3.1-x86_64-disk.img
# file cirros-0.3.1-x86_64-disk.img

cirros-0.3.1-x86_64-disk.img: QEMU QCOW Image (v2), 41126400 bytes

Notice that it’s a QCOW image, so we need to convert it to raw format:

qemu-img convert -f qcow2 -O raw cirros-0.3.1-x86_64-disk.img cirros-0.3.1-x86_64-disk.raw

Now, let’s upload the raw image into Glance, not forgetting to source our credentials file of course:

source credentials
glance image-create --name cirrosRaw --is-public=true --disk-format=raw --container-format=bare < cirros-0.3.1-x86_64-disk.raw
glance image-list

# glance image-list
+--------------------------------------+------------+-------------+------------------+----------+--------+
| ID                                   | Name       | Disk Format | Container Format | Size     | Status |
+--------------------------------------+------------+-------------+------------------+----------+--------+
| be5c4f9e-b8c8-4ed8-91e7-cef6eaf64e0a | cirros     | qcow2       | bare             | 13147648 | active |
| 262fb084-8b5d-4567-8247-be11102dec8a | cirrosRaw  | raw         | bare             | 41126400 | active |
+--------------------------------------+------------+-------------+------------------+----------+--------+

Note the ID of the raw image. We can create a cinder volume based on that image. We can do this from within the web portal or via the command line. I’ll create a 4GB volume called cephVolume1:

cinder create --image-id 262fb084-8b5d-4567-8247-be11102dec8a --display-name cephVolume1 4

Warning: There was a bug in cinder (Cannot create volume from glance image without checksum) that will cause this command to fail. Although this bug has been fixed, the fix has not made it into the Ubuntu 14.04 cloud archive as of this writing. To apply the fix, download the fixed version of the glance.py module (https://review.openstack.org/cat/90644%2C1%2Ccinder/image/glance.py%5E0) for cinder and copy it to /usr/lib/python2.7/dist-packages/cinder/image/glance.py on both of your controller nodes (or wherever you have cinder installed).

Finally, we can build our Nova instance that boots from the new volume:

nova instance
How does this help us achieve high availability? Well, if our compute node were to go belly up, the instance will die of course, but its boot volume is safe on our Ceph cluster. So to revive the instance, create a new one on another compute node, and attach it to the same volume. Boot it up and you’re back in business!

 

 

21 thoughts on “OpenStack High Availability – Booting from a Ceph Volume

  1. Samuel Yaple

    If you have more than one compute node it is very important that you define the libvirt secret on every compute node with the _same_ UUID.

    Reply
    1. Brian Seltzer

      Thanks for that one Samuel. The Ceph documentation states: “You don’t necessarily need the UUID on all the compute nodes. However from a platform consistency perspective it’s better to keep the same UUID” but that’s good enough for me. I’ve updated the article to show how to apply the same UUID to additional compute nodes.

      Reply
      1. Samuel Yaple

        That is technically correct, you don’t need it to be the same. However, you would have to modify nova.conf with the uuid specific for each compute node. Not updating the UUID won’t cause any problems until you try and use a ceph volume. And it is easily forgotten.

        Since we are pushing around the ceph keys anyway, there is no benefit to randomized UUID on each node, it just complicates administration in my opinion.

        Reply
  2. CDOT

    What we want is that the instances running on a compute node should shift automatically to other compute node(s) in case of the failure of the first compute node while still being attached to the respective volumes. How can we achieve this Instance Level HA?

    Reply
  3. martin

    Hello Brian, I’ve setup my ceph openstack based on the offical doc from ceph website. Then I saw your howto and verified my setup. I must have done an error somewhere in a conf file but I can’t find it.
    If I create a volume in cinder based on an image in glance, the volumes gets created but I can’t spawn an instances out of it. I get this on one of the compute node:

    2015-02-04 08:30:38.160 13419 TRACE nova.compute.manager [instance: 484892be-49d5-4ce6-a1c7-dd9d3781e38b] File “/usr/lib/python2.7/site-packages/eventlet/tpool.py”, line 122, in execute
    2015-02-04 08:30:38.160 13419 TRACE nova.compute.manager [instance: 484892be-49d5-4ce6-a1c7-dd9d3781e38b] six.reraise(c, e, tb)
    2015-02-04 08:30:38.160 13419 TRACE nova.compute.manager [instance: 484892be-49d5-4ce6-a1c7-dd9d3781e38b] File “/usr/lib/python2.7/site-packages/eventlet/tpool.py”, line 80, in tworker
    2015-02-04 08:30:38.160 13419 TRACE nova.compute.manager [instance: 484892be-49d5-4ce6-a1c7-dd9d3781e38b] rv = meth(*args, **kwargs)
    2015-02-04 08:30:38.160 13419 TRACE nova.compute.manager [instance: 484892be-49d5-4ce6-a1c7-dd9d3781e38b] File “/usr/lib64/python2.7/site-packages/libvirt.py”, line 728, in createWithFlags
    2015-02-04 08:30:38.160 13419 TRACE nova.compute.manager [instance: 484892be-49d5-4ce6-a1c7-dd9d3781e38b] if ret == -1: raise libvirtError (‘virDomainCreateWithFlags() failed’, dom=self)
    2015-02-04 08:30:38.160 13419 TRACE nova.compute.manager [instance: 484892be-49d5-4ce6-a1c7-dd9d3781e38b] libvirtError: internal error: process exited while connecting to monitor: Warning: option deprecated, use lost_tick_policy property of kvm-pit instead.
    2015-02-04 08:30:38.160 13419 TRACE nova.compute.manager [instance: 484892be-49d5-4ce6-a1c7-dd9d3781e38b] qemu-kvm: -drive file=rbd:volumes/volume-bfe0dab2-ae11-432d-9436-cfdd5a05677e:id=nova:key=AQCL3tBUsD4EDRAAFH5WHRuEDoDHHzmIiB60iQ==:auth_supported=cephx\;none:mon_host=10.3.1.61\:6789,if=none,id=drive-virtio-disk0,format=raw,serial=bfe0dab2-ae11-432d-9436-cfdd5a05677e,cache=none: could not open disk image rbd:volumes/volume-bfe0dab2-ae11-432d-9436-cfdd5a05677e:id=nova:key=AQCL3tBUsD4EDRAAFH5WHRuEDoDHHzmIiB60iQ==:auth_supported=cephx\;none:mon_host=10.3.1.61\:6789: Unknown protocol

    Reply
  4. martin

    The error is here:
    qemu-kvm: -drive file=rbd:volumes/volume-bfe0dab2-ae11-432d-9436-cfdd5a05677e:id=nova:key=AQCL3tBUsD4EDRAAFH5WHRuEDoDHHzmIiB60iQ==:auth_supported=cephx\;none:mon_host=10.3.1.61\:6789,if=none,id=drive-virtio-disk0,format=raw,serial=bfe0dab2-ae11-432d-9436-cfdd5a05677e,cache=none: could not open disk image rbd:volumes/volume-bfe0dab2-ae11-432d-9436-cfdd5a05677e:id=nova:key=AQCL3tBUsD4EDRAAFH5WHRuEDoDHHzmIiB60iQ==:auth_supported=cephx\;none:mon_host=10.3.1.61\:6789: Unknown protocol

    [root@compute1 qemu]# /usr/libexec/qemu-kvm –drive format=?
    Supported formats: vvfat vpc vmdk vhdx vdi sheepdog sheepdog sheepdog raw host_cdrom host_floppy host_device file qed qcow2 qcow parallels nbd nbd nbd iscsi gluster gluster gluster gluster dmg cow cloop bochs blkverify blkdebug

    …doesn’t show support for rbd backend volume…
    Do you know if this can be compiled in?

    Reply
    1. Brian Seltzer Post author

      It sounds as if you haven’t installed the ceph client on the compute node, or add the ceph configuration to the nova.conf.

      Reply
      1. martin

        Centos 7 doesn’t support rbd backend. Maybe anopther kernel would be available that has proper support.

        Reply
        1. Brian Seltzer Post author

          Yeah this article was written for Icehouse and Ubuntu 14.04. I’ve got some newer articles using CentOS 7 and Juno, but no Ceph integration on those… yet!

          Reply
          1. Upendra

            I am having same issue with centos7.I was able to create volumes and upload images to glance but not able to boot a vm from rbd volume.it get hung at build state

  5. bgyako

    Gettin Error : Error: Failed to launch instance “test2_Ub12_sm”: Please try again later [Error: internal error: process exited while connecting to monitor: qemu-system-x86_64: -drive file=rbd:tst_datastore/883fe236-d35a-4c9a-b816-6edbd0f5d30d_disk:id=tst_ceph:key=AQD9BtVUiLPFNRAA7gcdq6bzDQwQ1uyVtCR3kw==:auth_supported=cephx\;none:mon_host=192.168.8.].

    I have Glance using swift and cinder using ceph, is that the problem?
    Also noticed that no configuration was made on the controllers is that correct?

    Reply
  6. bgyako

    Brian, not sure if this makes sense, but I got it to work.
    I need to remove : I think specifically libvirt_images_rbd_ceph_conf
    libvirt_images_type=rbd
    libvirt_images_rbd_pool=tst_datastore
    libvirt_images_rbd_ceph_conf=/etc/ceph/ceph.conf

    and add instead:
    rbd_user=tst_ceph
    rbd_pool=tst_datastore

    Reply
  7. Mike

    Hi Brian, excellent walk thro 🙂 Just noticed that there is a copy/paste error from the ceph docs into your walk thro.

    ceph auth get-key client.icehouse | ssh 192.168.1.45 tee client.cinder.key

    As used in your walk thro this should be

    ceph auth get-key client.icehouse | ssh 192.168.1.45 tee client.icehouse.key

    The virsh set-secret-value will be wrong if used as currently specified

    BTW are you going to be adding anything about ceilometer and mongoDB, as this enables autoscaling in heat etc

    Reply
  8. 1mike

    Hi Brian,

    I have 3 compute Nodes and in that same servers I have Ceph running. Do I still need to configure the above Topic.

    Please advice me.

    Regards,
    MIKE

    Reply
    1. Brian Seltzer Post author

      Yes, in order to boot an instance from a Ceph volume, you need to provision a cinder volume from a glance image, with ceph as the backing storage behind cinder, as per the article.

      Reply
      1. 1mike

        I have a problem at start. Error come as:

        $ sudo virsh secret-define –file secret.xml
        error: Failed to set attributes from secret.xml
        error: (definition_of_secret):1: Start tag expected, ‘ secret.xml <<EOF

        $UUID

        client.icehouse secret

        EOF

        Please advice.
        Regards,
        MIKE

        Reply
        1. Brian Seltzer Post author

          Your secret.xml should look like this (UUID will be different of course):

          <secret ephemeral='no' private='no'>
          <uuid>59081FF0-7D38-4D8E-81A2-D3B58F94FE92</uuid>
          <usage type='ceph'>
          <name>client.icehouse secret</name>
          </usage>
          </secret>

          Reply
  9. venkat bokka

    Hi Brain,
    i am using openstack Kilo on debian, ceph firefly(0.80.7), libvirt 1.2.9 and qemu 2.3

    i am able to create cinder volume with ceph as backend, but when i am trying to attach volume to running instance it failing with the error
    libvirtError: internal error: unable to execute QEMU command ‘device_add’: Property ‘virtio-blk-device.drive’ can’t find value ‘drive-virtio-disk2′
    Can you please help on this error.

    Thanks & Regards,
    Naga Venkata

    Reply
    1. Brian Seltzer Post author

      This post is based on Icehouse, not Kilo, so I can’t directly address your issue, however I’d suggest that you need to make sure you configured your compute nodes correctly to access ceph block devices.

      Reply

Leave a Reply