其实ceph存储是底层的规范,应该在部署kubernetes集群前就准备好的,使他为k8s集群提供存储服务。能够用来存储pod,docker镜像,日志数据等node
Ceph 是一个分布式存储系统,独一无二地用统一的系统—Ceph 存储集群,提供了对象存储,块存储和文件存储三种功能。Ceph 的存储集群基于 RADOS,提供了极大伸缩性—供成千用户访问 PB 乃至 EB 级的数据。 Ceph 节点以普通硬件和智能守护进程做为支撑点, Ceph 存储集群组织起了大量节点,它们之间靠相互通信来复制数据、同时采用 CRUSH 算法动态地重分布数据。
Ceph 有不少术语,了解这些术语,对理解 Ceph 的体系结构是很是重要的。Ceph 的常见术语。python
名词 | 解释 |
---|---|
RADOSGW | 对象网关守护进程 |
RBD | 块存储 |
CEPHFS | 文件存储 |
LIBRADOS | 和 RADOS 交互的基本库 librados,Ceph 经过原生协议和 RADOS 交互,Ceph 把这种功能封装进了 librados 库,这样你就能定制本身的客户端 |
RADOS | 存储集群 |
OSD | Object Storage Device,RADOS 的组件,用于存储资源 |
Monitor | 监视器,RADOS 的组件,维护整个 Ceph 集群的全局状态 |
MDS | Ceph 元数据服务器,为 Ceph 文件系统存储元数据 |
Ceph分布式存储集群由若干组件组成,包括:Ceph Monitor
、Ceph OSD
和Ceph MDS
,其中若是你仅使用对象存储和块存储时,MDS不是必须的,仅当你要用到Cephfs时,MDS才是须要安装的。咱们这须要安装MDS。算法
CephRBD是否支持多Pod同时挂载呢?官方文档中给出了否认的答案: 基于CephRBD的Persistent Volume仅支持两种accessmode:ReadWriteOnce和ReadOnlyMany,不支持ReadWriteMany。docker
Ceph的安装模型与k8s有些相似,也是经过一个deploy node远程操做其余Node以create、prepare和activate各个Node上的Ceph组件shell
资源有限,在k8s集群节点中部署ceph集群,后面的hosts仍是沿用k8s集群的,可能会有些难识别vim
节点 name | 主机名 | 节点 IP | 配置 | 说明 |
---|---|---|---|---|
ceph-mon-0 | node-01 | 172.24.10.20 | centos7.4 | 管理节点,监视器 monitor,mds |
ceph-mon-1 | node-02 | 172.24.10.21 | centos7.4 | 监视器 monitor,mds,client |
ceph-mon-2 | node-03 | 172.24.10.22 | centos7.4 | 监视器 monitor,mds |
ceph-osd-0 | node-01 | 172.24.10.20 | 20G | 存储节点 osd |
ceph-osd-1 | node-02 | 172.24.10.21 | 20G | 存储节点 osd |
ceph-osd-2 | node-03 | 172.24.10.22 | 20G | 存储节点 osd |
ceph-osd-3 | node-04 | 172.24.10.23 | 20G | 存储节点 osd |
ceph-osd-4 | node-05 | 172.24.10.24 | 20G | 存储节点 osd |
ceph-osd-5 | node-06 | 172.24.10.25 | 20G | 存储节点 osd |
~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 172.24.10.20 node-01 172.24.10.21 node-02 172.24.10.22 node-03 172.24.10.23 node-04 172.24.10.24 node-05 172.24.10.25 node-06
同时管理节点和其余节点作好ssh-key免密码登录windows
~]# yum install epel-release -y && yum upgrade -y ~]# rpm -Uvh https://download.ceph.com/rpm-luminous/el7/noarch/ceph-release-1-1.el7.noarch.rpm ~]# ansible ceph -a 'rpm -Uvh https://download.ceph.com/rpm-luminous/el7/noarch/ceph-release-1-1.el7.noarch.rpm' # 批量安装 # 官方源太慢,后面构建集群安装包各类超时 [Ceph] name=Ceph packages for $basearch baseurl=http://mirrors.aliyun.com/ceph/rpm-luminous/el7/$basearch enabled=1 gpgcheck=1 type=rpm-md gpgkey=https://download.ceph.com/keys/release.asc [Ceph-noarch] name=Ceph noarch packages baseurl=http://mirrors.aliyun.com/ceph/rpm-luminous/el7/noarch enabled=1 gpgcheck=1 type=rpm-md gpgkey=https://download.ceph.com/keys/release.asc [ceph-source] name=Ceph source packages baseurl=http://mirrors.aliyun.com/ceph/rpm-luminous/el7/SRPMS enabled=1 gpgcheck=1 type=rpm-md gpgkey=https://download.ceph.com/keys/release.asc # ansible ceph -m copy -a 'src=/etc/yum.repos.d/ceph.repo dest=/etc/yum.repos.d/ceph.repo'
# 建立密码 python -c "from passlib.hash import sha512_crypt; import getpass; print sha512_crypt.encrypt(getpass.getpass())" Password: ceph $6$rounds=656000$PZshbGs2TMKtUgB1$LTdZj9xxHsJH5wRNSLYQL8CH7bAaE4415g/aRZD39RJiRrPx.Bzu19Y5/aOqQuFUunr7griuDN7BAlcTOkuw81 # 本机sudo visudo Defaults:ceph timestamp_timeout=-1 ceph ALL=(root) NOPASSWD:ALL # ansible建立用户密码yaml ~]# vim user.yml - hosts: ceph remote_user: root tasks: - name: add user user: name=ceph password='$6$rounds=656000$PZshbGs2TMKtUgB1$LTdZj9xxHsJH5wRNSLYQL8CH7bAaE4415g/aRZD39RJiRrPx.Bzu19Y5/aOqQuFUunr7griuDN7BAlcTOkuw81' - name: sudo config copy: src=/etc/sudoers dest=/etc/sudoers - name: sync ssh key authorized_key: user=ceph state=present exclusive=yes key='{{lookup('file', '/home/ceph/.ssh/id_rsa.pub')}}' # 运行playbook ansible-playbook user.yml
在管理节点进行centos
~]$ sudo yum install ceph-deploy
~]$ mkdir ceph-cluster ~]$ cd ceph-cluster ceph-cluster]$ ceph-deploy new node-01 node-02 node-03
$ cat ceph.conf [global] fsid = 64960081-9cfe-4b6f-a9ae-eb9b2be216bc mon_initial_members = node-01, node-02, node-03 mon_host = 172.24.10.20,172.24.10.21,172.24.10.22 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx #更改 osd 个数 osd pool default size = 6 [mon] #容许 ceph 集群删除 pool mon_allow_pool_delete = true [mgr] mgr modules = dashboard
~]$ ceph-deploy install --no-adjust-repos node-01 node-02 node-03 node-04 node-05 node-06 # 不加--no-adjust-repos 会一直使用ceph-deploy提供的默认的源,很坑
初始化mon,收集全部密钥服务器
cd ceph-cluster/ ceph-deploy mon create-initial
报错1:app
ceph-mon-2 Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-mon-2.asok mon_status ceph-mon-2 admin_socket: exception getting command descriptions: [Errno 2] No such file or directory ceph_deploy.mon mon.ceph-mon-2 monitor is not yet in quorum, tries left: 1 ceph_deploy.mon waiting 20 seconds before retrying ceph_deploy.mon Some monitors have still not reached quorum: ceph_deploy.mon ceph-mon-0 ceph_deploy.mon ceph-mon-1 ceph_deploy.mon ceph-mon-2 # 查看/var/run/ceph目录 ]$ ls /var/run/ceph/ ceph-mon.k8s-master-01.asok // 成节点的主机名的方式命名的 # 移除错误环境 ]$ ceph-deploy mon destroy node-01 node-02 node-03 仍是不行,主机名必须惟一
清理环境
$ ceph-deploy purge node-01 node-02 node-03 node-04 node-05 node-06 // 会移除全部与ceph相关的 $ ceph-deploy purgedata node-01 node-02 node-03 node-04 node-05 node-06 $ ceph-deploy forgetkeys
报错2
[node-03][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node-03.asok mon_status [ceph_deploy.mon][WARNIN] mon.node-03 monitor is not yet in quorum, tries left: 5 [ceph_deploy.mon][WARNIN] waiting 5 seconds before retrying [node-03][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node-03.asok mon_status [ceph_deploy.mon][WARNIN] mon.node-03 monitor is not yet in quorum, tries left: 4 [ceph_deploy.mon][WARNIN] waiting 10 seconds before retrying [node-03][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node-03.asok mon_status [ceph_deploy.mon][WARNIN] mon.node-03 monitor is not yet in quorum, tries left: 3 [ceph_deploy.mon][WARNIN] waiting 10 seconds before retrying [node-03][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node-03.asok mon_status [ceph_deploy.mon][WARNIN] mon.node-03 monitor is not yet in quorum, tries left: 2 [ceph_deploy.mon][WARNIN] waiting 15 seconds before retrying [node-03][INFO ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node-03.asok mon_status [ceph_deploy.mon][WARNIN] mon.node-03 monitor is not yet in quorum, tries left: 1 [ceph_deploy.mon][WARNIN] waiting 20 seconds before retrying [ceph_deploy.mon][ERROR ] Some monitors have still not reached quorum: [ceph_deploy.mon][ERROR ] node-02 [ceph_deploy.mon][ERROR ] node-03 [ceph_deploy.mon][ERROR ] node-01
解决
iptables 策略未经过,能够清空规则,或者添加默认的监听端口6789
查看启动服务
$ ps -ef|grep ceph ceph 4693 1 0 16:45 ? 00:00:00 /usr/bin/ceph-mon -f --cluster ceph --id node-01 --setuser ceph --setgroup ceph # 手动中止方法
在管理节点上登陆到每一个 osd 节点,建立 osd 节点的数据存储目录(老版本)
# osd-0 ssh node-01 sudo mkdir /var/local/osd0 sudo chown -R ceph.ceph /var/local/osd0 # osd-1 ssh node-02 sudo mkdir /var/local/osd1 sudo chown -R ceph.ceph /var/local/osd1 # osd-2 ssh node-03 sudo mkdir /var/local/osd2 sudo chown -R ceph.ceph /var/local/osd2 # osd-3 ssh node-04 sudo mkdir /var/local/osd3 sudo chown -R ceph.ceph /var/local/osd3 # osd-4 ssh node-05 sudo mkdir /var/local/osd4 sudo chown -R ceph.ceph /var/local/osd4 # osd-5 ssh node-06 sudo mkdir /var/local/osd5 sudo chown -R ceph.ceph /var/local/osd5
在管理节点上执行命令,使每一个 osd 就绪(prepare)(老版本)
ceph-deploy osd prepare node-01:/var/local/osd0 node-02:/var/local/osd1 node-03:/var/local/osd2 node-04:/var/local/osd3 node-05:/var/local/osd4 node-06:/var/local/osd5 # --overwrite-conf
激活每一个osd节点(老版本)
ceph-deploy osd activate node-01:/var/local/osd0 node-02:/var/local/osd1 node-03:/var/local/osd2 node-04:/var/local/osd3 node-05:/var/local/osd4 node-06:/var/local/osd5
添加激活osd磁盘(老版本的)
ceph-deploy osd create --bluestore node-01:/var/local/osd0 node-02:/var/local/osd1 node-03:/var/local/osd2 node-04:/var/local/osd3 node-05:/var/local/osd4 node-06:/var/local/osd5
新版ceph-deploy直接使用create
至关于prepare,activate,osd create --bluestore
ceph-deploy osd create --data /dev/sdb node-01 ceph-deploy osd create --data /dev/sdb node-02 ceph-deploy osd create --data /dev/sdb node-03 ceph-deploy osd create --data /dev/sdb node-04 ceph-deploy osd create --data /dev/sdb node-05 ceph-deploy osd create --data /dev/sdb node-06
在管理节点把配置文件和 admin 密钥拷贝到管理节点和 Ceph 节点
ceph-deploy admin node-01 node-02 node-03 node-04 node-05 node-06
在每一个节点上赋予 ceph.client.admin.keyring 有操做权限
sudo ansible ceph -a 'chmod +r /etc/ceph/ceph.client.admin.keyring'
$ ceph -s cluster: id: 64960081-9cfe-4b6f-a9ae-eb9b2be216bc health: HEALTH_WARN clock skew detected on mon.node-02, mon.node-03 services: mon: 3 daemons, quorum node-01,node-02,node-03 mgr: node-01(active), standbys: node-02, node-03 osd: 6 osds: 6 up, 6 in data: pools: 0 pools, 0 pgs objects: 0 objects, 0 bytes usage: 6337 MB used, 113 GB / 119 GB avail pgs:
health问题解决
health: HEALTH_WARN clock skew detected on mon.node-02, mon.node-03 这个是时间同步形成的 $ sudo ansible ceph -a 'yum install ntpdate -y' $ sudo ansible ceph -a 'systemctl stop ntpdate' $ sudo ansible ceph -a 'ntpdate time.windows.com' $ ceph -s cluster: id: 64960081-9cfe-4b6f-a9ae-eb9b2be216bc health: HEALTH_OK services: mon: 3 daemons, quorum node-01,node-02,node-03 mgr: node-01(active), standbys: node-03, node-02 mds: cephfs-1/1/1 up {0=node-02=up:active}, 2 up:standby osd: 6 osds: 6 up, 6 in data: pools: 2 pools, 192 pgs objects: 21 objects, 2246 bytes usage: 6354 MB used, 113 GB / 119 GB avail pgs: 192 active+clean
查看状态
$ ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.11691 root default -3 0.01949 host node-01 0 hdd 0.01949 osd.0 up 1.00000 1.00000 -5 0.01949 host node-02 1 hdd 0.01949 osd.1 up 1.00000 1.00000 -7 0.01949 host node-03 2 hdd 0.01949 osd.2 up 1.00000 1.00000 -9 0.01949 host node-04 3 hdd 0.01949 osd.3 up 1.00000 1.00000 -11 0.01949 host node-05 4 hdd 0.01949 osd.4 up 1.00000 1.00000 -13 0.01949 host node-06 5 hdd 0.01949 osd.5 up 1.00000 1.00000
查看挂载
$ df -Th Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/centos-root xfs 17G 1.5G 16G 9% / devtmpfs devtmpfs 478M 0 478M 0% /dev tmpfs tmpfs 488M 0 488M 0% /dev/shm tmpfs tmpfs 488M 6.6M 482M 2% /run tmpfs tmpfs 488M 0 488M 0% /sys/fs/cgroup /dev/sda1 xfs 1014M 153M 862M 16% /boot tmpfs tmpfs 98M 0 98M 0% /run/user/0 tmpfs tmpfs 488M 48K 488M 1% /var/lib/ceph/osd/ceph-0 ]$ cat /var/lib/ceph/osd/ceph-0/type bluestore
自从ceph 12开始,manager是必须的。应该为每一个运行monitor的机器添加一个mgr,不然集群处于WARN状态。
$ ceph-deploy mgr create node-01 node-02 node-03 ceph config-key put mgr/dashboard/server_addr 172.24.10.20 ceph config-key put mgr/dashboard/server_port 7000 ceph mgr module enable dashboard http://172.24.10.20:7000/
http://docs.ceph.com/docs/master/rados/operations/placement-groups/
$ ceph-deploy mds create node-01 node-02 node-03 $ ceph osd pool create cephfs_data 128 $ ceph osd pool create cephfs_metadata 64 $ ceph fs new cephfs cephfs_metadata cephfs_data $ ceph fs ls name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ] $ ceph mds stat cephfs-1/1/1 up {0=node-02=up:active}, 2 up:standby 虽然支持多 active mds并行运行,但官方文档建议保持一个active mds,其余mds做为standby
client是规划在node2上的
在物理机上挂载cephfs可使用mount命令、mount.ceph(apt-get install ceph-fs-common)或ceph-fuse(apt-get install ceph-fuse),咱们先用mount命令挂载
$ sudo mkdir /data/ceph-storage/ -p $ sudo chown -R ceph.ceph /data/ceph-storage $ ceph-authtool -l /etc/ceph/ceph.client.admin.keyring [client.admin] key = AQAEKJFa54MlFRAAg76JDhpwlHD1F8J2G76baQ== $ sudo mount -t ceph 172.24.10.21:6789:/ /data/ceph-storage/ -o name=admin,secret=AQAEKJFa54MlFRAAg76JDhpwlHD1F8J2G76baQ== $ df -Th Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/centos-root xfs 17G 1.5G 16G 9% / devtmpfs devtmpfs 478M 0 478M 0% /dev tmpfs tmpfs 488M 0 488M 0% /dev/shm tmpfs tmpfs 488M 6.7M 481M 2% /run tmpfs tmpfs 488M 0 488M 0% /sys/fs/cgroup /dev/sda1 xfs 1014M 153M 862M 16% /boot tmpfs tmpfs 98M 0 98M 0% /run/user/0 tmpfs tmpfs 488M 48K 488M 1% /var/lib/ceph/osd/ceph-1 tmpfs tmpfs 98M 0 98M 0% /run/user/1000 172.24.10.21:6789:/ ceph 120G 6.3G 114G 6% /data/ceph-storage