drbd共享存储的简单配置-高可用存储

时间 2019-11-17

标签 drbd 共享存储简单配置可用繁體版

原文原文链接

主机1：server5.example.com 172.25.254.5html
主机2：server6.example.com 172.25.254.56
java
安装drbdnode

yum install gcc flex rpm-build kernel-devel -y
rpmbuild ~ #在家目录生成 rpmbuild 编译所需路径
cp drbd-8.4.0.tar.gz rpmbuild/SOURCES/
tar zxf drbd-8.4.0.tar.gz
cd drbd-8.4.0
./configure --enable-spec --with-km
rpmbuild -bb drbd.spec
#编译生成 drbd rpm 包
rpmbuild -bb drbd-km.spec #编译 drbd 内核模块
cd ~/rpmbuild/RPMS/x86_64
rpm -ivh *
拷贝生成的 rpm 包到另外一主机,并安装软件包:
scp ~/rpmbuild/RPMS/x86_64/* 172.25.254.6:/root

2.配置drbd.res文件 /etc/drbd/drbd.d
mysql

resource mysqldata {
        meta-disk internal;
        device /dev/drbd1;
        syncer {
                verify-alg sha1;
        }
on server5.example.com { // 此处必须为主机名，解析里改也不行。切忌
        disk /dev/vdb; /drbd要使用的存储磁盘
        address 172.25.254.5:7789;
}
on server6.example.com {
        disk /dev/vdb;
        address 172.25.254.6:7789;
}
}
两台主机都执行的操做
drbdadm create-md mysqldata
/etc/init.d/drbd start
cat /proc/drbd 能够查看状态
下来在server5上将 server5 设置为 primary 节点,并同步数据:(在 demo 主机执行如下命令)
drbdsetup /dev/drbd1 primary --force
在两台主机上查看同步状态:
watch cat /proc/drbd
数据同步结束后建立文件系统:
mkfs.ext4 /dev/drbd1
挂载文件系统:
mount /dev/drbd1 /var/lib/mysql
而后在html里新的文件的都会保存
要在另外一台服务器同步
首先须要将server5上的/dev/drbd1 卸载
server5设置为secondary 
drbdadm secondary mysqldata
server6上执行 drbdadm primary mysqldata
drbdadm primary mysqldata
mount /dev/drbd1 /var/lib/mysql
便可实现同步数据库操做

drbd介绍、工做原理及脑裂故障处理
算法

1、drbd基本介绍sql

drbd(全称为Distributed Replicated Block Device，简称drbd)分布式块设备复制，说白了就是在不一样节点上两个相同大小的设备块级别之间的数据同步镜像。drbd是由内核模块和相关脚本而构成，用以构建高可用性的集群。数据库

在高可用(HA)解决方案中使用drbd的功能，能够代替使用一个共享盘阵存储设备。由于数据同时存在于本地主机和远程主机上，在遇到须要切换的时候，远程主机只须要使用它上面的那份备份数据，就能够继续提供服务了。安全

2、drbd的结构示意图及工做原理服务器

从上图咱们能够清晰的看出drbd是以主从(Primary/Secondary)方式工做的，这点原理与mysql的主从复制的架构有些类似。主节点上的drbd提高为Primary并负责接收写入数据，当数据到达drbd模块时，一份继续往下走写入到本地磁盘实现数据的持久化，同时并将接收到的要写入的数据发送一分到本地的drbd设备上经过tcp传到另一台主机的drbd设备上（Secondary node），另外一台主机上的对应的drbd设备再将接收到的数据存入到本身的磁盘当中。这里与mysql的基于经过二进制日志完成数据的复制的确很类似，可是也有一些不一样之处。好比：mysql的从节点不能写可是能够读，可是drbd的从节点是不能读、不能挂载。网络

所以，drbd对同一设备块每次只容许对主节点进行读、写操做，从节点不能写也不能读。这样感受是否是对主机有资源浪费，的确HA架构中为了提供冗余能力是有资源浪费，可是你能够对上图的两台主机创建两个drbd资源并互为主从，这样两台机器都能利用起来，可是配置起来就复杂了。可是话又说回来，用 drbd做为廉价的共享存储设备，要节约不少成本，由于价格要比专用的存储网络便宜不少，其性能与稳定性方面也还不错。

3、drbd的复制模式（协议）

A协议：

异步复制协议。一旦本地磁盘写入已经完成，数据包已在发送队列中，则写被认为是完成的。在一个节点发生故障时，可能发生数据丢失，由于被写入到远程节点上的数据可能仍在发送队列。尽管，在故障转移节点上的数据是一致的，但没有及时更新。所以，这种模式效率最高，可是数据不安全，存在数据丢失。

B协议：

内存同步（半同步）复制协议。一旦本地磁盘写入已完成且复制数据包达到了对等节点则认为写在主节点上被认为是完成的。数据丢失可能发生在参加的两个节点同时故障的状况下，由于在传输中的数据可能不会被提交到磁盘

C协议：

同步复制协议。只有在本地和远程节点的磁盘已经确认了写操做完成，写才被认为完成。没有数据丢失，因此这是一个群集节点的流行模式，但I/O吞吐量依赖于网络带宽。所以，这种模式数据相对安全，可是效率比较低。

4、drbd的安装配置

一、安装 sudo apt-get install drbd8-utils

二、两节点准备工做

node一、node2时间同步；

两节点各自准备一个大小相同的分区块；

创建双机互信，实现互信登录

三、drbd文件结构说明

/etc/drbd.conf 主配置文件

/etc/drbd.d/global_common.conf 定义配置global、common段

/etc/drbd.d/*.res 定义资源

四、drbd配置

4.一、global_common.conf

global {

usage-count no; 是否加入统计

}

common {

protocol C; 使用什么协议

handlers {

定义处理机制程序，/usr/lib/drbd/ 里有大量的程序脚本，可是不必定靠谱

}

startup {

定义启动超时等

}

disk {

磁盘相关公共设置，好比I/O，磁盘故障了怎么办

}

net {

定义网络传输、加密算法等

}

syncer {

rate 1000M; 定义网络传输速率

}

4.二、资源配置(*.res)

resource name{

meta-disk internal; # node1/node2 公用部分能够提取到顶部

on node1{

device /dev/drbd0;

disk /dev/sda6;

address 192.168.1.101:7789;

}

on node2 {

device /dev/drbd0;

disk /dev/sda6;

address 192.168.1.102:7789;

}

五、以上文件在两个节点上必须相同，所以，能够基于ssh将刚才配置的文件所有同步至另一个节点

# scp -p /etc/drbd.d/* node2:/etc/drbd.d /

六、启动测试

1）初始化资源，在Node1和Node2上分别执行：

# sudo drbdadm create-md mydata

2）启动服务，在Node1和Node2上分别执行：

# sudo service drbd start

3）查看启动状态

# cat /proc/drbd

4）从上面的信息中能够看出此时两个节点均处于Secondary状态。因而，咱们接下来须要将其中一个节点设置为Primary。在要设置为Primary的节点上执行以下命令：

# sudo drbdadm -- --overwrite-data-of-peer primary all (第一次执行此命令)

# sudo drbdadm primary --force mydata

第一次执行完此命令后，在后面若是须要设置哪一个是主节点时，就可使用另一个命令：

# /sbin/drbdadm primary r0或者/sbin/drbdadm primary all

5）监控数据同步

# watch -n1 'cat /proc/drbd'

6）数据同步完成格式化drbd分区，并挂载

# sudo mke2fs -t ext4 /dev/drbd0

# sudo moun/dev/drbd0 /mnt

# ls -l /mnt

测试OK~

5、脑裂故障处理

在作Corosync+DRBD的高可用MySQL集群实验中，意外发现各个节点没法识别对方，链接为StandAlone则主从节点没法通讯，效果如上图。

如下为drbd脑裂手动恢复过程(以node1的数据位主，放弃node2不一样步数据)：

1）将Node1设置为主节点并挂载测试，mydata为定义的资源名

# drbdadm primary mydata

# mount /dev/drbd0 /mydata

# ls -lh /mydata 查看文件状况

2）将Node2设置为从节点并丢弃资源数据

# drbdadm secondary mydata

# drbdadm -- --discard-my-data connect mydata

3）在Node1主节点上手动链接资源

# drbdadm connect mydata

4）最后查看各个节点状态，链接已恢复正常

# cat /proc/drbd

测试效果以下图(故障修复)：

6、drbd其余相关（文献部分）：

一、 DRBD各类状态含义

The resource-specific output from/proc/drbd contains various pieces ofinformation about the resource:

cs(connection state). Status of the network connection. See the section called “Connection states” for details about the various connection states.

ro(roles). Roles of the nodes. The role of the local node isdisplayed first, followed by the role of the partnernode shown after the slash. See the section called “Resource roles” for details about thepossible resource roles.

ds(disk states). State of the hard disks. Prior to the slash thestate of the local node is displayed, after the slashthe state of the hard disk of the partner node isshown. See the section called “Disk states” for details about the variousdisk states.

ns(network send). Volume of net data sent to the partner via thenetwork connection; in Kibyte.

nr(network receive). Volume of net data received by the partner viathe network connection; in Kibyte.

dw(disk write). Net data written on local hard disk; inKibyte.

dr(disk read). Net data read from local hard disk; in Kibyte.

al(activity log). Number of updates of the activity log area of the metadata.

bm(bit map). Number of updates of the bitmap area of the metadata.

lo(local count). Number of open requests to the local I/O sub-systemissued by DRBD.

pe(pending). Number of requests sent to the partner, but thathave not yet been answered by the latter.

ua(unacknowledged). Number of requests received by the partner via thenetwork connection, but that have not yet beenanswered.

ap(application pending). Number of block I/O requests forwarded to DRBD, butnot yet answered by DRBD.

ep(epochs). Number of epoch objects. Usually 1. Might increaseunder I/O load when using either thebarrier or the nonewriteordering method. Since 8.2.7.

wo(write order). Currently used write ordering method:b (barrier), f(flush),d(drain) or n(none). Since8.2.7.

oos(out of sync). Amount of storage currently out of sync; inKibibytes. Since 8.2.6.

二、 DRBD链接状态

A resource may have one of the following connectionstates:

StandAlone. No network configuration available. The resourcehas not yet been connected, or has beenadministratively disconnected (using drbdadm disconnect), or has dropped its connectiondue to failed authentication or split brain.

Disconnecting. Temporary state during disconnection. The nextstate is StandAlone.

Unconnected. Temporary state, prior to a connection attempt.Possible next states: WFConnection andWFReportParams.

Timeout. Temporary state following a timeout in thecommunication with the peer. Next state:Unconnected.

BrokenPipe. Temporary state after the connection to the peerwas lost. Next state: Unconnected.

NetworkFailure. Temporary state after the connection to thepartner was lost. Next state: Unconnected.

ProtocolError. Temporary state after the connection to thepartner was lost. Next state: Unconnected.

TearDown. Temporary state. The peer is closing theconnection. Next state: Unconnected.

WFConnection. This node is waiting until the peer node becomesvisible on the network.

WFReportParams. TCP connection has been established, this nodewaits for the first network packet from thepeer.

Connected. A DRBD connection has been established, datamirroring is now active. This is the normalstate.

StartingSyncS. Full synchronization, initiated by theadministrator, is just starting. The next possiblestates are: SyncSource or PausedSyncS.

StartingSyncT. Full synchronization, initiated by theadministrator, is just starting. Next state:WFSyncUUID.

WFBitMapS. Partial synchronization is just starting. Nextpossible states: SyncSource or PausedSyncS.

WFBitMapT. Partial synchronization is just starting. Nextpossible state: WFSyncUUID.

WFSyncUUID. Synchronization is about to begin. Next possiblestates: SyncTarget or PausedSyncT.

SyncSource. Synchronization is currently running, with thelocal node being the source ofsynchronization.

SyncTarget. Synchronization is currently running, with thelocal node being the target ofsynchronization.

PausedSyncS. The local node is the source of an ongoingsynchronization, but synchronization is currentlypaused. This may be due to a dependency on thecompletion of another synchronization process, ordue to synchronization having been manuallyinterrupted by drbdadm pause-sync.

PausedSyncT. The local node is the target of an ongoingsynchronization, but synchronization is currentlypaused. This may be due to a dependency on thecompletion of another synchronization process, ordue to synchronization having been manuallyinterrupted by drbdadm pause-sync.

VerifyS. On-line device verification is currently running,with the local node being the source ofverification.

VerifyT. On-line device verification is currently running,with the local node being the target ofverification.