ceph 在openstack中的应用

阿凡达2018-08-14 09:26

Openstack中的三类数据

在openstack云计算平台中,数据可以简单的分为三部分:


云主机镜像

openstack中,云主机镜像由glance服务来管理,而对应的存储后端则可以配置。可以存放在本地,也可以存放于其他存储后端,比如rbd,s3,sheepdog,swift等存储后端。


云主机系统盘(/数据盘)

云主机运行的系统盘,由nova服务来管理,当前支持raw,qcow2, lvm, iscsi,rbd 等格式。 其中,raw,qcow2, lvm 三种格式系统盘数据都位于云主机所在的宿主机上;iscsi格式的数据存放在SAN存储后端,而rbd格式的系统盘数据位于对应的ceph存储后端上。


云硬盘

云硬盘作为IAAS的一部分,一般都和云主机结合使用。 用户可以创建云硬盘并挂载到云主机中。云硬盘在openstack中通过cinder服务来管理,支持lvm,gfs,sheepdog,rbd等多种存储后端。

综合上述三部分数据存储方式来看,可以将ceph rbd块存储作为openstack的统一存储后端,将云主机镜像,云主机系统盘,云硬盘都统一存储在ceph存储系统中。


Openstack + ceph

当采用ceph作为openstack的统一存储后端,可以通过如下几个场景来分析下ceph在openstack中的应用。

NOTE: 在ceph中分别为镜像,云主机,卷创建了三个pool:images,vms,volumes来存放对应的数据。


上传镜像

在images pool中创建一个rbd image,然后将镜像文件写入该rbd image,写I/O操作完成后,对该image做一个快照。

  • create an rbd image (rbd.RBD().create() --> librbd.rbd_create())
  • open rbd image (librbd.rbd_open())
  • write rbd image (librbd.rbd_write)
  • image.create_snap()
  • image.protect_snap()
上传镜像后,对应会在ceph的 images pool中生成一个image块设备,如下:
ubuntu@devstack-ntse:~$ nova image-list
+--------------------------------------+--------------------------+--------+--------+
| ID                                   | Name                     | Status | Server |
+--------------------------------------+--------------------------+--------+--------+
| 5f8fd142-6483-439d-ab1b-98cae45c9c1f | cirros-ceph              | ACTIVE |        |
+--------------------------------------+--------------------------+--------+--------+

ubuntu@devstack-ntse:~$ rbd ls --pool images
5f8fd142-6483-439d-ab1b-98cae45c9c1f
ubuntu@devstack-ntse2:~$ rbd snap ls images/5f8fd142-6483-439d-ab1b-98cae45c9c1f
SNAPID NAME    SIZE 
     2 snap 9532 kB


通过文件镜像创建云主机

H版本实现

  • 下载镜像到本地
  • 通过glance download qcow2 image到本地qcow2文件
  • 将本地qcow2文件转换为本地raw镜像文件
  • rbd import 上述raw格式的镜像文件,在vms pool生成一个rbd image
  • libvirt通过rbd协议访问上述的rbd image来启动云主机
2015-05-06 15:39:38.596 DEBUG nova.image.glance [req-928625f6-e9cc-4400-9cfc-bf8d421783bd admin admin] fetching image 5f8fd142-6483-439d-ab1b-98cae45c9c1f from glance get_remote_image_service /opt/stack/nova/nova/image/glance.py:582

qemu-img convert -O raw /opt/stack/data/nova/instances/_base/1df301678856178b53ef8ab83e93686d2e0e803e.part /opt/stack/data/nova/instances/_base/1df301678856178b53ef8ab83e93686d2e0e803e.converted

os.rename(/opt/stack/data/nova/instances/_base/1df301678856178b53ef8ab83e93686d2e0e803e.converted, /opt/stack/data/nova/instances/_base/1df301678856178b53ef8ab83e93686d2e0e803e)

rbd import --pool vms /opt/stack/data/nova/instances/_base/1df301678856
178b53ef8ab83e93686d2e0e803e instance-00000006_disk --new-format --id cinder --conf /etc/ceph/ceph.conf

instance.xml:
<domain>
...
  <devices>
    <disk type="network" device="disk">
      <driver type="raw" cache="none"/>
      <source protocol="rbd" name="vms/instance-00000006_disk">
        <host name="10.166.224.18" port="6789"/>
        <host name="10.166.224.19" port="6789"/>
        <host name="10.166.224.21" port="6789"/>
      </source>
      <auth username="cinder">
        <secret type="ceph" uuid="457eb676-33da-42ec-9a8c-9293d545c337"/>
      </auth>
      <target bus="virtio" dev="vda"/>
    </disk>
   ...
  </devices>
...
</domain>


J版本之后的优化实现

  • J版本中对上述流程进行了优化,优化后的流程为:
  • 如果镜像非raw或iso格式,则走原有的流程。
  • 如果镜像为raw格式,则利用该镜像的rbd快照,clone一个rbd image到vms pool
  • libvirt通过rbd协议访问上述的rbd image来启动云主机
rbd clone流程(python-ceph package):
rbd.RBD().clone():
    -> self.librbd.rbd_clone()


通过云主机创建镜像

  • 休眠云主机
  • 将快照文件从云主机系统盘导出
    qemu-img convert -O raw rbd:vms/instance-00000006_disk /opt/stack/data/nova/instances/snapshots/tmplZTUTN/5b4630ccfce54027ad4a6793f3b7fb56
    
  • 恢复云主机
  • 将导出的快照文件上传到glance


通过镜像创建系统卷

  • cinder 调用glance api下载qcow2镜像
  • qemu-img将镜像转换为raw格式
    qemu-img convert -O raw {image_file} /tmp/tmpFb0wW7
    
  • 删除ceph后端cinder创建的卷
  • cinder 调用rbd import命令将镜像文件导入新卷(具有和cinder数据库相同的名称)
    rbd import --pool volumes /tmp/tmpFb0wW7 volume-fc4fdcb9-a48b-4065-bb0e-e0a6d706b2ba --new-format
    
  • 最终会在volumes pool生成一个rbd image
    ubuntu@devstack-ntse2:~$ rbd ls --pool volumes
    volume-fc4fdcb9-a48b-4065-bb0e-e0a6d706b2ba
    ubuntu@devstack-ntse:~$ cinder list
    +--------------------------------------+-----------+--------------------+------+-------------+----------+-------------+
    |                  ID                  |   Status  |    Display Name    | Size | Volume Type | Bootable | Attached to |
    +--------------------------------------+-----------+--------------------+------+-------------+----------+-------------+
    | fc4fdcb9-a48b-4065-bb0e-e0a6d706b2ba | available | test_bootable_ceph |  1   |     ceph    |   true   |             |
    +--------------------------------------+-----------+--------------------+------+-------------+----------+-------------+
    


系统卷启动云主机

  • libvirt通过rbd协议访问rbd volume来启动云主机
instance.xml:
<domain>
...
  <devices>
    <disk type="network" device="disk">
      <driver name="qemu" type="raw" cache="none"/>
      <source protocol="rbd" name="volumes/volume-fc4fdcb9-a48b-4065-bb0e-e0a6d706b2ba">
        <host name="10.166.224.18" port="6789"/>
        <host name="10.166.224.19" port="6789"/>
        <host name="10.166.224.21" port="6789"/>
      </source>
      <auth username="cinder">
        <secret type="ceph" uuid="457eb676-33da-42ec-9a8c-9293d545c337"/>
      </auth>
      <target bus="virtio" dev="vda"/>
      <serial>fc4fdcb9-a48b-4065-bb0e-e0a6d706b2ba</serial>
    </disk>
   ...
  </devices>
...
</domain>


卷创建卷快照

  • rbd创建快照

    client, ioctx = driver._connect_to_rados(pool)                                                                                                  
    self.volume = driver.rbd.Image(ioctx, str(name),                                                                                                                                                                                 
                                 snapshot=ascii_str(snapshot),           
                                 read_only=read_only) 
    volume.create_snap(snap)                                            
    if hasattr(self.rbd, 'RBD_FEATURE_LAYERING'):                                       
     volume.protect_snap(snap)
    
  • 创建完快照后生成的快照如下:
ubuntu@devstack-ntse2:~$ rbd snap ls volumes/volume-a6dc411a-c735-47d2-b9de-46ddc7e79708
SNAPID NAME                                             SIZE 
     2 snapshot-1b5ccbe5-cc22-4ad0-bd96-64928f1c8350 1024 MB 
ubuntu@devstack-ntse2:~$ cinder snapshot-list
+--------------------------------------+--------------------------------------+-----------+--------------------+------+
|                  ID                  |              Volume ID               |   Status  |    Display Name    | Size |
+--------------------------------------+--------------------------------------+-----------+--------------------+------+
| 1b5ccbe5-cc22-4ad0-bd96-64928f1c8350 | a6dc411a-c735-47d2-b9de-46ddc7e79708 | available | test-ceph-snapshot |  1   |
+--------------------------------------+--------------------------------------+-----------+--------------------+------+


卷快照恢复卷

  • 从快照clone卷,然后看条件是否进行flatten,之后进行resize。
with RADOSClient(self, src_pool) as src_client:                         
with RADOSClient(self) as dest_client:
self.rbd.RBD().clone(src_client.ioctx,
encodeutils.safe_encode(src_image),
encodeutils.safe_encode(src_snap),
dest_client.ioctx,
encodeutils.safe_encode(volume['name']),
features=src_client.features)
if self.configuration.rbd_flatten_volume_from_snapshot:
with RBDVolumeProxy(self, volume_name, pool) as vol:
vol.flatten()
if int(volume['size']):
with RBDVolumeProxy(self, volume['name']) as vol:
vol.resize(size)

恢复后生成的卷

ubuntu@devstack-ntse2:~$ cinder create --snapshot-id 1b5ccbe5-cc22-4ad0-bd96-64928f1c8350 --display-name vol-from-snapshot 1 --volume-type ceph
+---------------------+--------------------------------------+
|       Property      |                Value                 |
+---------------------+--------------------------------------+
|     attachments     |                  []                  |
|  availability_zone  |                 ntse                 |
|       bootable      |                false                 |
|      created_at     |      2015-05-12T07:01:57.526374      |
| display_description |                 None                 |
|     display_name    |          vol-from-snapshot           |
|          id         | 79a9eab2-ed8e-49bd-a8fa-147177636e5d |
|       metadata      |                  {}                  |
|         size        |                  1                   |
|     snapshot_id     | 1b5ccbe5-cc22-4ad0-bd96-64928f1c8350 |
|     source_volid    |                 None                 |
|        status       |               creating               |
|     volume_type     |                 ceph                 |
+---------------------+--------------------------------------+
ubuntu@devstack-ntse2:~$ cinder list
+--------------------------------------+-----------+--------------------+------+-------------+----------+--------------------------------------+
|                  ID                  |   Status  |    Display Name    | Size | Volume Type | Bootable |             Attached to              |
+--------------------------------------+-----------+--------------------+------+-------------+----------+--------------------------------------+
| 79a9eab2-ed8e-49bd-a8fa-147177636e5d | available | vol-from-snapshot  |  1   |     ceph    |  false   |                                      |
| a6dc411a-c735-47d2-b9de-46ddc7e79708 | available |     test_ceph      |  1   |     ceph    |  false   |                                      |
+--------------------------------------+-----------+--------------------+------+-------------+----------+--------------------------------------+
ubuntu@devstack-ntse2:~$ rbd ls --pool volumes
volume-79a9eab2-ed8e-49bd-a8fa-147177636e5d
volume-a6dc411a-c735-47d2-b9de-46ddc7e79708


系统卷快照创建云主机

  • 从系统卷快照恢复系统卷
  • 从系统卷启动云主机


网易云新用户大礼包:https://www.163yun.com/gift

本文来自网易实践者社区,经作者管强授权发布。