mysql GTID主从复制故障后不停机恢复同步流程

GTID实现主从复制数据同步node

GTID是一个基于原始mysql服务器生成的一个已经被成功执行的全局事务ID,它由服务器ID以及事务ID组成,这个全局事务ID不只仅在原始服务器上惟一,在全部主从关系的mysql服务器上也是惟一的。正式由于这样一个特性使得mysql主从复制变得更加简单,以及数据库一致性更可靠。mysql

 

介绍sql

GTID的概念数据库

  1.  全局事务标识:global transaction identifiers
  2.  GTID是一个事务一一对应,而且全局惟一ID
  3.  一个GTID在一个服务器上只执行一次,避免重复执行致使数据混乱不一致
  4.  再也不使用传统的MASTER_LOG_FILE+MASTER_LOG_POS开启复制,而是采用MASTER_AUTO_POSTION=1的方式开启复制。
  5.  MYSQL-5.6.5及后续版本开始支持

 

GTID的组成vim

GTID = server_uuid:transaction_id安全

server_uuid:mysql服务器的惟一标识,查看方法mysql客户端内:show variables like '%server_uuid%';bash

transaction_id:此id是当前服务器中提交事务的一个序列号,从1开始自增加,一个数值对应一个事务服务器

GTID号示例:c9fba9e2-db3b-11eb-81d4-000c298d8da1:1-5session

 

GTID的优点架构

  1.  实现主从更简单,不用像之前同样寻找log_file和log_pos
  2.  比传统的主从更加安全
  3.  GTID是连续没有空洞的,保证数据一致性,零丢失。

 

GTID工做原理

  1. master更新数据时,会在事务前产生GTID,一同记录到binlog日志中
  2. slave端的I/O线程将变动的binlog,写入到本地的relay log中
  3. SQL线程从relay log中获取GTID,而后对比slave端的binlog是否有记录(因此MySQL5.6 slave端必须开启binlog)
  4. 若是有记录,说明该GTID的事务已经执行,slave会忽略
  5. 若是没有记录,slave就会从relay log中执行该GTID的事务,并记录到binlog
  6. 在解析过程当中会判断是否有主键,若是没有就用二级索引,若是没有就用所有扫描

 

开始配置GTID复制

主:192.168.152.253   Centos7

从:192.168.152.252   Centos8

测试数据库:vfan

测试表:student

 

一、修改mysql服务配置文件,添加如下参数,随后重启:

server-id=100    #server id log-bin=/var/lib/mysql/mysql-bin #开启binlog并指定存储位置 expire_logs_days=10 #日志保存时间为10天 gtid_mode=on #gtid模块开关 enforce_gtid_consistency=on #启动GTID强一致性,开启gtid模块必须开启此功能。 binlog_format=row #bin_log日志格式,共有三种STATEMENT、ROW、MIXED;默认为STATEMENT skip_slave_start=1  #防止复制随着mysql启动而自动启动

主服务器和从服务器的配置一致便可,server-id更改一下

 

二、在主服务器中建立从服务器链接的用户

CREATE USER 'copy'@'192.168.152.252' IDENTIFIED BY 'copy'; GRANT REPLICATION SLAVE ON *.* TO 'copy'@'192.168.152.252'; flush privileges;

建立完毕记得要测试下slave机是否能登陆成功

 

三、使用mysqldump使两数据库数据同步

主mysql执行: mysqldump -uroot -proot1 vfan > dump2.sql scp dump2.sql 192.168.152.252:/data/ 从mysql执行: mysql> source /data/dump2.sql

 

当前主、从服务器数据内容一致,都是如下数据:

mysql> select * from student; +----+------+-----+
| id | name | age |
+----+------+-----+
|  1 | Tony |  18 |
|  2 | Any  |  17 |
|  3 | Goy  |  20 |
|  4 | Baly |  18 |
|  5 | Heg  |  19 |
|  6 | hhh  | 100 |
|  7 | lll  |  99 |
+----+------+-----+
7 rows in set (0.01 sec)

 

四、开启主从复制

mysql> CHANGE MASTER TO MASTER_HOST='192.168.152.253',MASTER_USER='copy',MASTER_PASSWORD='copy',MASTER_PORT=3306,MASTER_AUTO_POSITION=1; Query OK, 0 rows affected, 2 warnings (0.04 sec) mysql> start slave; Query OK, 0 rows affected (0.01 sec) ## 查看slave状态 mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.152.253 Master_User: copy Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000014 Read_Master_Log_Pos: 897 Relay_Log_File: kubenode2-relay-bin.000002 Relay_Log_Pos: 416 Relay_Master_Log_File: mysql-bin.000014 Slave_IO_Running: Yes Slave_SQL_Running: Yes

 

五、检查是否同步

主服务器中插入数据: mysql> INSERT INTO student(name,age) VALUES('gogoo',50),('zhazha',25); Query OK, 2 rows affected (0.03 sec) Records: 2  Duplicates: 0  Warnings: 0 从服务器中读取: mysql> select * from student; +----+--------+-----+
| id | name   | age |
+----+--------+-----+
|  1 | Tony   |  18 |
|  2 | Any    |  17 |
|  3 | Goy    |  20 |
|  4 | Baly   |  18 |
|  5 | Heg    |  19 |
|  6 | hhh    | 100 |
|  7 | lll    |  99 |
|  8 | gogoo  |  50 |
|  9 | zhazha |  25 |
+----+--------+-----+
9 rows in set (0.00 sec)

数据已经同步,基础的主从复制已经搭建完成

 

如今模拟一个主从复制架构中,从服务器中途复制失败,再也不同步主服务器的场景,并要求不停业务进行数据同步修复,恢复一致。

一、首先先模拟一个数据插入的场景

vim insert.sh

#!/usr/bin/env bash values=(`find /usr/ -type d | awk -F '/' '{print $NF}' | sort -u`) while true
do age=$(( $RANDOM%100 )) name=${values[$(( $RANDOM%6 ))]} mysql -h127.1 -P3306 -uroot -proot1 -e "INSERT INTO vfan.student(name,age) VALUES('"${name}"',${age});" &> /dev/null 
sleep $(( $RANDOM%5 )) done

运行脚本,数据在随机插入(插入时间间隔 < 5s)

目前主mysql数据:

mysql> select * from student; +----+---------------------+-----+
| id | name                | age | ...... |  97 | _                   |   2 |
|  98 | 00bash              |  15 |
|  99 | 00bash              |  52 |
| 100 | 00bash              |  43 |
| 101 | _                   |  65 |
| 102 | 00                  |  67 |
+-----+---------------------+-----+
102 rows in set (0.01 sec)

 

二、数据还在陆续插入,此时模拟slave节点宕机或异常(在此就直接stop slave;)

mysql> stop slave; Query OK, 0 rows affected (0.01 sec)

 

三、此时主库数据还在增长,而从库已经不一样步,如下是从库数据:

mysql> select * from student; +----+---------------------+-----+
| id | name                | age | ...... | 82 | 00bash              |  50 |
| 83 | 00systemd-bootchart |  36 |
| 84 | 00bash              |  48 |
| 85 | 00systemd-bootchart |  41 |
| 86 | 00                  |  72 |
+----+---------------------+-----+
86 rows in set (0.00 sec)

 

四、开始从库恢复数据

思路:

 先经过mysqldump全量备份当前的数据,因为不能影响业务,因此在mysqldump数据时不能形成锁表。要保持数据写入

 因为mysqldump时数据还在写入,因此有一部分数据仍是会同步不全,因此导入mysqldump的数据后,跳过dump中包含的GTID事务,再从新创建一次主从配置,开启slave线程,恢复数据并同步。

 

1)mysqldump不锁表备份数据

mysqldump -uroot -proot1 --single-transaction --master-data=2 -R vfan | gzip > dump4.sql

主要起做用参数:--single-transaction

 

2)查看当前mysqldump导出数据的GTID号

[root@TestCentos7 data]# grep GLOBAL.GTID_PURGED dump4.sql SET @@GLOBAL.GTID_PURGED=/*!80000 '+'*/ 'c9fba9e2-db3b-11eb-81d4-000c298d8da1:1-228';

以上的 c9fba9e2-db3b-11eb-81d4-000c298d8da1:1-228 表示MASTER机执行到的GTID事务号

 

3)去从数据库导入

scp dump4.sql 192.168.152.252:/data mysql客户端内: mysql> source /data/dump4.sql 此时从库数据: mysql> select * from student; | 230 | 00                  |  53 |
| 231 | 00bash              |  66 |
| 232 | _                   |  18 |
| 233 | 0.33.0              |  98 |
| 234 | 00bash              |  14 |
+-----+---------------------+-----+
234 rows in set (0.00 sec) 主库数据: | 454 | _                   |  46 |
| 455 | 03modsign           |  59 |
| 456 | 00systemd-bootchart |  77 |
| 457 | 03modsign           |   6 |
| 458 | 0.33.0              |  88 |
+-----+---------------------+-----+
458 rows in set (0.00 sec)

从库数据恢复一部分到234行,主库数据依然在增长,已是458条

 

4)因为咱们mysqldump的数据已经包含了在MASTER执行的 1-228 个事务,因此咱们在SLAVE进行同步的时候,要忽略这些事务再也不进行同步,否则会出现相似于这种报错:

mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.152.253 Master_User: copy Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000002 Read_Master_Log_Pos: 137827 Relay_Log_File: kubenode2-relay-bin.000002 Relay_Log_Pos: 417 Relay_Master_Log_File: mysql-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: No Last_Errno: 1062 Last_Error: Could not execute Write_rows event on table vfan.student; Duplicate entry '87' for key 'student.PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000002, end_log_pos 10588

 

要想跳过某些GTID,SLAVE必须保证 gtid_purged 参数为空才能正确跳过,查看当前的gtid_purged:

mysql> show global variables like '%gtid%'; +----------------------------------+-------------------------------------------------------------------------------------+
| Variable_name                    | Value                                                                               |
+----------------------------------+-------------------------------------------------------------------------------------+
| binlog_gtid_simple_recovery      | ON                                                                                  |
| enforce_gtid_consistency         | ON                                                                                  |
| gtid_executed                    | b30cb2ff-32d4-11eb-a447-000c292826bc:1-2, c9fba9e2-db3b-11eb-81d4-000c298d8da1:1-80 |
| gtid_executed_compression_period | 1000                                                                                |
| gtid_mode                        | ON                                                                                  |
| gtid_owned                       |                                                                                     |
| gtid_purged                      | c9fba9e2-db3b-11eb-81d4-000c298d8da1:1-70                                           |
| session_track_gtids              | OFF                                                                                 |
+----------------------------------+-------------------------------------------------------------------------------------+
8 rows in set (0.02 sec)

 

当前gtid_purged不为空,因此咱们要先设置它为空,执行:

mysql> reset master; Query OK, 0 rows affected (0.05 sec) mysql> show global variables like '%gtid%'; +----------------------------------+-------+
| Variable_name                    | Value |
+----------------------------------+-------+
| binlog_gtid_simple_recovery      | ON    |
| enforce_gtid_consistency         | ON    |
| gtid_executed                    |       |
| gtid_executed_compression_period | 1000  |
| gtid_mode                        | ON    |
| gtid_owned                       |       |
| gtid_purged                      |       |
| session_track_gtids              | OFF   |
+----------------------------------+-------+
8 rows in set (0.00 sec)

 

5)gtid_purged为空后,开始重置SLAVE

mysql> stop slave; Query OK, 0 rows affected (0.00 sec) mysql> reset slave all; Query OK, 0 rows affected (0.02 sec)

 

6)重置后,设置跳过的GTID,并从新同步MASTER

mysql> SET @@GLOBAL.GTID_PURGED='c9fba9e2-db3b-11eb-81d4-000c298d8da1:1-228'; Query OK, 0 rows affected (0.01 sec) mysql> CHANGE MASTER TO MASTER_HOST='192.168.152.253',MASTER_USER='copy',MASTER_PASSWORD='copy',MASTER_PORT=3306,MASTER_AUTO_POSITION=1; Query OK, 0 rows affected, 2 warnings (0.04 sec)

 

7)开启SLAVE进程,查看同步状态

mysql> start slave; Query OK, 0 rows affected (0.01 sec) mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.152.253 Master_User: copy Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000002 Read_Master_Log_Pos: 137827 Relay_Log_File: kubenode2-relay-bin.000002 Relay_Log_Pos: 84993 Relay_Master_Log_File: mysql-bin.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 137827 Relay_Log_Space: 85206 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 100 Master_UUID: c9fba9e2-db3b-11eb-81d4-000c298d8da1 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: c9fba9e2-db3b-11eb-81d4-000c298d8da1:229-519 Executed_Gtid_Set: c9fba9e2-db3b-11eb-81d4-000c298d8da1:1-519 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: Master_public_key_path: Get_master_public_key: 0 Network_Namespace: 1 row in set (0.00 sec)

能够看到,同步正常!

 

8)最后,查看master与slave数据是否一致

MASTER数据:SELECT * FROM student; | 520 | 00systemd-bootchart |  18 |
| 521 | 00systemd-bootchart |  44 |
| 522 | 03modsign           |  98 |
| 523 | 00systemd-bootchart |  45 |
| 524 | 00                  |  90 |
| 525 | 03modsign           |  21 |
+-----+---------------------+-----+
525 rows in set (0.00 sec) SLAVE数据:SELECT * FROM student; | 519 | 0.33.0              |  99 |
| 520 | 00systemd-bootchart |  18 |
| 521 | 00systemd-bootchart |  44 |
| 522 | 03modsign           |  98 |
| 523 | 00systemd-bootchart |  45 |
| 524 | 00                  |  90 |
| 525 | 03modsign           |  21 |
+-----+---------------------+-----+
525 rows in set (0.00 sec)

在咱们修过程当中插入的数据也已经所有同步。数据彻底一致,主从复制修复完成。

相关文章
相关标签/搜索