优化sql的过程当中发现表上少一个索引,直接加一个?会不会hang住?不加?sql又跑很差,由此引出一个问题——ddl操做怎么作?mysql
show variables like 'innodb%max%'; innodb_online_alter_log_max_size | 134217728
若是线上更新操做比较多,调大这个值 set global innodb_online_alter_log_max_size = 128M,这是个全局变量,在my.cnf中也配上算法
一、两个参数:lock和algorithm
锁模式:sql
模式 | 含义 |
---|---|
default | 根据事务最大并发判断用什么模式 |
none | 不加任何锁,不阻塞 |
shared | 共享模式,和5.1的fast index creation同样,可读,但不支持dml |
exclusive | 排他模式,任何操做都不支持 |
算法:centos
算法 | 含义 |
---|---|
default | 根据old_alter_table决定用哪一个算法,off为用新算法,即inplace |
inplace | 共享锁,只支持增长和删除索引两种操做 |
copy | 须要拷贝数据,效率低 |
无论用什么模式,online ddl开始以前都会有一个短期的排他锁,结束以前也同样,因此说,操做以前须要确保没有大事务执行,不然会出现严重阻塞session
二、两种算法添加索引步骤对比(5.5版本)并发
- | copy | inplace |
---|---|---|
1 | 新建带索引的临时表 | 建立索引数据字典(只能是二级索引,若是是主键指定inplace也会转为copy) |
2 | 锁原表,禁止DML,容许查询 | 加共享锁,禁止DML,容许查询 |
3 | 将原表数据拷贝到临时表 | 读取聚簇索引,构造新的索引项,排序并插入新索引 |
4 | 升级shared锁为exclusive,禁止读写,作rename(修改数据字典,很快) | 等待打开当前表的全部只读事务提交 |
5 | 完成建立索引操做 | 建立索引结束 |
三、语法:oracle
alter table tb_name ... lock = xxx,algorithm = xxx 注意:多个ddl操做建议放到一条语句种执行,效率比分开执行高
tips:
以上分析是针对5.5及以前的状况,即那时候只有增长、删除索引不须要拷贝原表,但也不能操做DMLapp
online ddl包含copy和inplace两种socket
修改列类型和删除主键用copy工具
inplace又分为rebuild和no-rebuild两种
rebuild须要重建表,修改记录格式,添加、删除列、修改默认值都用rebuild
no-rebuild只须要修改元数据,添加、删除索引、修改列名则用no-rebuild
rebuild方式比no-rebuild方式实质多了一个ddl执行阶段
先检测一些命名、长度等限制
- | prepare | ddl | commit |
---|---|---|---|
1 | server层建立临时frm | 降级exclusive-mdl锁,容许读写(copy不可写) | 升级exclusive-mdl锁,禁止读写 |
2 | 持有exclusive-mdl锁,禁止读写 | 扫描原表的聚簇索引每条记录 | 应用最后row_log种产生的日志 |
3 | 根据alter类型,肯定执行方式(copy,inplace-rebuild,inplace-norebuild) | 遍历新表的聚簇索引和二级索引 | 更新innodb的数据字典 |
4 | 更新数据字典的内存对象 | 根据记录构造对应的索引项 | 提交事务(刷新事务的redo日志) |
5 | 分配row_log对象记录增量 | 将索引项插入sort_buffer块 | 修改统计信息 |
6 | innodb层生成临时ibd文件(rebuild状况下) | 将sort_buffer块插入新的索引 | rename临时idb、frm文件 |
7 | 数据字典上提交事务、释放锁 | 处理ddl执行过程种产生的增量(rebuild状况下) | 变动完成 |
参数 | - |
---|---|
old_alter_table | 默认off即用inplace模式 |
tmpdir | 建立索引时排序的内存不够则在此目录作 |
innodb_online_alter_log_max_size | 存row_log |
tips:
②online ddl中inplace是优选项,ALGORITHM=COPY定会拷贝表,只读,但ALGORITHM=INPLACE也可能拷贝表,但能够并发DML(由于有row_log)
③5.6依然不支持online的ddl操做:修改列的数据类型,删除主键,变动表字符集
④inplace对dml的支持比较好,但消耗却比copy大
①数据完整性--->row_log
②online和数据一致性--->propare和commit时短暂mdl,几乎全程online
③server和innodb一致--->prepare时server生成frm,innodb生成临时ibd,ddl时原表拷贝到ibd,row_log应用到ibd,commit时innodb修改数据字典,提交,最后innodb和server重命名ibd和rfm
问题:
在线索引添加存在的一个问题——主从延时(MySQL逻辑复制,oracle物理复制不存在这个问题)
缘由:
alter table是执行完以后才告诉从机要执行(事务),从库再顺序执行。
若是是copy的那种online ddl,执行到这个ddl,其余并行的dml语句则要等待这个ddl执行完毕后才能继续(看上文原理),以下图:
主从延迟的产生: +------------------------+ | master | o_ddl_5min +------------------------+ | | |log| 同步的是二进制日志,要等事务执行完以后才提交过去,和物理日志不一样 | | +------------------------+ | slave | o_ddl_5min +------------------------+
所以,即便5.7如今对愈来愈多的ddl操做读写不阻塞了,真正在线上也不多用alter table这种方式去执行ddl操做
目前咱们经常使用的一个工具是pt-osc
这个工具作在线ddl,主从延迟很是小,它不是直接操做的,是经过触发器的机制来慢慢作,还有专门控制延迟的参数
yum install -y perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker perl-Time-HiRes perl-DBI perl-DBD-MySQL cd /usr/local/src wget https://www.percona.com/downloads/percona-toolkit/3.0.4/binary/tarball/percona-toolkit-3.0.4_x86_64.tar.gz tar zxvf percona-toolkit-3.0.4_x86_64.tar.gz cd percona-toolkit-3.0.4 perl Makefile.PL make make install pt-online-schema-change --alter "convert to character set utf8b4" D=test,t=a 显示操做步骤,真正执行要加 --excute pt-online-schema-change --alter "dd index index_a (a)" D=test,t=a --excute 整个过程拆成不少小的步骤,一个一个传到从上去,因此延迟比较小,缺点是时间长
tips:
percona toolkit中最有用的就是pt-online-schema-change,其余工具官方工具包utlities里面都有了,尽可能用官方的,另外官方也在作osc了
方案:
步骤 | 操做 |
---|---|
step1 | sysbench导入测试数据到test库sbtest1中 |
step2 | 开启general_log,并输出到mysql.general_log表 |
step3 | osc给sbtest1表的c字段加一个索引(能够把execute换作--dry-run) |
step4 | 分析glog |
step1:略 step2: (root@localhost) [(none)]> truncate mysql.general_log; Query OK, 0 rows affected (1.65 sec) (root@localhost) [(none)]> set global general_log = 1; Query OK, 0 rows affected (0.00 sec) (root@localhost) [(none)]> set global log_output = 'table'; Query OK, 0 rows affected (0.01 sec) step 3: pt-online-schema-change --alter "add index index_c (c)" --socket=/tmp/mysql.sock --user=root --password=123 D=test,t=sbtest1 --execute No slaves found. See --recursion-method if host VM_221_162_centos has slaves. Not checking slave lag because no slaves were found and --check-slave-lag was not specified. Operation, tries, wait: analyze_table, 10, 1 copy_rows, 10, 0.25 create_triggers, 10, 1 drop_triggers, 10, 1 swap_tables, 10, 1 update_foreign_keys, 10, 1 Altering `test`.`sbtest1`... Creating new table... Created new table test._sbtest1_new OK. Altering new table... Altered `test`.`_sbtest1_new` OK. 2017-11-30T18:28:19 Creating triggers... 2017-11-30T18:28:19 Created triggers OK. 2017-11-30T18:28:19 Copying approximately 493200 rows... 2017-11-30T18:28:41 Copied rows OK. 2017-11-30T18:28:41 Analyzing new table... 2017-11-30T18:28:41 Swapping tables... 2017-11-30T18:28:41 Swapped original and new tables OK. 2017-11-30T18:28:41 Dropping old table... 2017-11-30T18:28:41 Dropped old table `test`.`_sbtest1_old` OK. 2017-11-30T18:28:41 Dropping triggers... 2017-11-30T18:28:41 Dropped triggers OK. Successfully altered `test`.`sbtest1`. 上面已经能够看出个大概过程了 step 4: 这一步详细分5块分析以下: (root@localhost) [(none)]> set global log_output = 'file'; Query OK, 0 rows affected (0.00 sec) (root@localhost) [(none)]> set global general_log = 0; Query OK, 0 rows affected (0.01 sec) (root@localhost) [mysql]> select argument from mysql.general_log; root@localhost on test using Socket set autocommit=1 SHOW VARIABLES LIKE 'innodb\_lock_wait_timeout' SET SESSION innodb_lock_wait_timeout=1 SHOW VARIABLES LIKE 'lock\_wait_timeout' SET SESSION lock_wait_timeout=60 SHOW VARIABLES LIKE 'wait\_timeout' SET SESSION wait_timeout=10000 SELECT @@SQL_MODE SET @@SQL_QUOTE_SHOW_CREATE = 1/*!40101, @@SQL_MODE='NO_AUTO_VALUE_ON_ZERO,ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'*/ SELECT @@server_id /*!50038 , @@hostname*/ 说明: 一、设置session级的变量 SET SESSION innodb_lock_wait_timeout=1 SET SESSION lock_wait_timeout=60 SET SESSION wait_timeout=10000 ----------------------------------------- SHOW VARIABLES LIKE 'version%' SHOW ENGINES SHOW VARIABLES LIKE 'innodb_version' SHOW VARIABLES LIKE 'innodb_stats_persistent' SELECT @@SERVER_ID SHOW GRANTS FOR CURRENT_USER() SHOW FULL PROCESSLIST SHOW SLAVE HOSTS SHOW GLOBAL STATUS LIKE 'Threads_running' SHOW GLOBAL STATUS LIKE 'Threads_running' SELECT CONCAT(@@hostname, @@port) SHOW TABLES FROM `test` LIKE 'sbtest1' SELECT VERSION() SHOW TRIGGERS FROM `test` LIKE 'sbtest1' /*!40101 SET @OLD_SQL_MODE := @@SQL_MODE, @@SQL_MODE := '', @OLD_QUOTE := @@SQL_QUOTE_SHOW_CREATE, @@SQL_QUOTE_SHOW_CREATE := 1 */ USE `test` SHOW CREATE TABLE `test`.`sbtest1` /*!40101 SET @@SQL_MODE := @OLD_SQL_MODE, @@SQL_QUOTE_SHOW_CREATE := @OLD_QUOTE */ EXPLAIN SELECT * FROM `test`.`sbtest1` WHERE 1=1 SELECT table_schema, table_name FROM information_schema.key_column_usage WHERE referenced_table_schema='test' AND referenced_table_name='sbtest1' SHOW VARIABLES LIKE 'wsrep_on' /*!40101 SET @OLD_SQL_MODE := @@SQL_MODE, @@SQL_MODE := '', @OLD_QUOTE := @@SQL_QUOTE_SHOW_CREATE, @@SQL_QUOTE_SHOW_CREATE := 1 */ 说明: 一、查看变量,当前用户的权限,slave信息,版本信息等 二、检查sbtest1是否存在触发器 三、执行计划 四、检查sbtest1是否存在外键关联 ----------------------------------------- USE `test` SHOW CREATE TABLE `test`.`sbtest1` /*!40101 SET @@SQL_MODE := @OLD_SQL_MODE, @@SQL_QUOTE_SHOW_CREATE := @OLD_QUOTE */ CREATE TABLE `test`.`_sbtest1_new` ( `id` int(11) NOT NULL AUTO_INCREMENT, `k` int(11) NOT NULL DEFAULT '0', `c` char(120) NOT NULL DEFAULT '', `pad` char(60) NOT NULL DEFAULT '', PRIMARY KEY (`id`), KEY `k_1` (`k`) ) ENGINE=InnoDB AUTO_INCREMENT=500001 DEFAULT CHARSET=latin1 ALTER TABLE `test`.`_sbtest1_new` add index index_c (c) /*!40101 SET @OLD_SQL_MODE := @@SQL_MODE, @@SQL_MODE := '', @OLD_QUOTE := @@SQL_QUOTE_SHOW_CREATE, @@SQL_QUOTE_SHOW_CREATE := 1 */ USE `test` SHOW CREATE TABLE `test`.`_sbtest1_new` /*!40101 SET @@SQL_MODE := @OLD_SQL_MODE, @@SQL_QUOTE_SHOW_CREATE := @OLD_QUOTE */ SELECT TRIGGER_SCHEMA, TRIGGER_NAME, DEFINER, ACTION_STATEMENT, SQL_MODE, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, EVENT_MANIPULATION, ACTION_TIMING FROM INFORMATION_SCHEMA.TRIGGERS WHERE EVENT_MANIPULATION = 'DELETE' AND ACTION_TIMING = 'AFTER' AND TRIGGER_SCHEMA = 'test' AND EVENT_OBJECT_TABLE = 'sbtest1' SELECT TRIGGER_SCHEMA, TRIGGER_NAME, DEFINER, ACTION_STATEMENT, SQL_MODE, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, EVENT_MANIPULATION, ACTION_TIMING FROM INFORMATION_SCHEMA.TRIGGERS WHERE EVENT_MANIPULATION = 'UPDATE' AND ACTION_TIMING = 'AFTER' AND TRIGGER_SCHEMA = 'test' AND EVENT_OBJECT_TABLE = 'sbtest1' SELECT TRIGGER_SCHEMA, TRIGGER_NAME, DEFINER, ACTION_STATEMENT, SQL_MODE, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, EVENT_MANIPULATION, ACTION_TIMING FROM INFORMATION_SCHEMA.TRIGGERS WHERE EVENT_MANIPULATION = 'INSERT' AND ACTION_TIMING = 'AFTER' AND TRIGGER_SCHEMA = 'test' AND EVENT_OBJECT_TABLE = 'sbtest1' SELECT TRIGGER_SCHEMA, TRIGGER_NAME, DEFINER, ACTION_STATEMENT, SQL_MODE, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, EVENT_MANIPULATION, ACTION_TIMING FROM INFORMATION_SCHEMA.TRIGGERS WHERE EVENT_MANIPULATION = 'DELETE' AND ACTION_TIMING = 'BEFORE' AND TRIGGER_SCHEMA = 'test' AND EVENT_OBJECT_TABLE = 'sbtest1' SELECT TRIGGER_SCHEMA, TRIGGER_NAME, DEFINER, ACTION_STATEMENT, SQL_MODE, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, EVENT_MANIPULATION, ACTION_TIMING FROM INFORMATION_SCHEMA.TRIGGERS WHERE EVENT_MANIPULATION = 'UPDATE' AND ACTION_TIMING = 'BEFORE' AND TRIGGER_SCHEMA = 'test' AND EVENT_OBJECT_TABLE = 'sbtest1' SELECT TRIGGER_SCHEMA, TRIGGER_NAME, DEFINER, ACTION_STATEMENT, SQL_MODE, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, EVENT_MANIPULATION, ACTION_TIMING FROM INFORMATION_SCHEMA.TRIGGERS WHERE EVENT_MANIPULATION = 'INSERT' AND ACTION_TIMING = 'BEFORE' AND TRIGGER_SCHEMA = 'test' AND EVENT_OBJECT_TABLE = 'sbtest1' CREATE TRIGGER `pt_osc_test_sbtest1_del` AFTER DELETE ON `test`.`sbtest1` FOR EACH ROW DELETE IGNORE FROM `test`.`_sbtest1_new` WHERE `test`.`_sbtest1_new`.`id` <=> OLD.`id` CREATE TRIGGER `pt_osc_test_sbtest1_upd` AFTER UPDATE ON `test`.`sbtest1` FOR EACH ROW BEGIN DELETE IGNORE FROM `test`.`_sbtest1_new` WHERE !(OLD.`id` <=> NEW.`id`) AND `test`.`_sbtest1_new`.`id` <=> OLD.`id`;REPLACE INTO `test`.`_sbtest1_new` (`id`, `k`, `c`, `pad`) VALUES (NEW.`id`, NEW.`k`, NEW.`c`, NEW.`pad`);END CREATE TRIGGER `pt_osc_test_sbtest1_ins` AFTER INSERT ON `test`.`sbtest1` FOR EACH ROW REPLACE INTO `test`.`_sbtest1_new` (`id`, `k`, `c`, `pad`) VALUES (NEW.`id`, NEW.`k`, NEW.`c`, NEW.`pad`) 说明: 一、根据原表的表结构结建立一张新表 二、对新表上的c字段加索引,这里依然用的是alter 三、检查原表上触发器状况,5.6开始同一张表上不能存在同一个动做的触发器 四、针对新表建立三个触发器,DELETE,UPDATE和INSERT(重点看下三个触发器内容) ----------------------------------------- EXPLAIN SELECT * FROM `test`.`sbtest1` WHERE 1=1 SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `test`.`sbtest1` FORCE INDEX(`PRIMARY`) ORDER BY `id` LIMIT 1 /*first lower boundary*/ SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `test`.`sbtest1` FORCE INDEX (`PRIMARY`) WHERE `id` IS NOT NULL ORDER BY `id` LIMIT 1 /*key_len*/ EXPLAIN SELECT /*!40001 SQL_NO_CACHE */ * FROM `test`.`sbtest1` FORCE INDEX (`PRIMARY`) WHERE `id` >= '1' /*key_len*/ EXPLAIN SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `test`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1')) ORDER BY `id` LIMIT 999, 2 /*next chunk boundary*/ SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `test`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1')) ORDER BY `id` LIMIT 999, 2 /*next chunk boundary*/ EXPLAIN SELECT `id`, `k`, `c`, `pad` FROM `test`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1')) AND ((`id` <= '1000')) LOCK IN SHARE MODE /*explain pt-online-schema-change 16157 copy nibble*/ INSERT LOW_PRIORITY IGNORE INTO `test`.`_sbtest1_new` (`id`, `k`, `c`, `pad`) SELECT `id`, `k`, `c`, `pad` FROM `test`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1')) AND ((`id` <= '1000')) LOCK IN SHARE MODE /*pt-online-schema-change 16157 copy nibble*/ SHOW WARNINGS SHOW GLOBAL STATUS LIKE 'Threads_running' EXPLAIN SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `test`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1001')) ORDER BY `id` LIMIT 3787, 2 /*next chunk boundary*/ SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `test`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1001')) ORDER BY `id` LIMIT 3787, 2 /*next chunk boundary*/ EXPLAIN SELECT `id`, `k`, `c`, `pad` FROM `test`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1001')) AND ((`id` <= '4788')) LOCK IN SHARE MODE /*explain pt-online-schema-change 16157 copy nibble*/ INSERT LOW_PRIORITY IGNORE INTO `test`.`_sbtest1_new` (`id`, `k`, `c`, `pad`) SELECT `id`, `k`, `c`, `pad` FROM `test`.`sbtest1` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '1001')) AND ((`id` <= '4788')) LOCK IN SHARE MODE /*pt-online-schema-change 16157 copy nibble*/ SHOW WARNINGS SHOW GLOBAL STATUS LIKE 'Threads_running' 说明: 一、chunk太多,此处只贴两组 二、以chunk为单位进行目标表数据的拷贝(根据pk或uk分片),有专门的参数指定怎么分片 三、在拷贝的过程当中,对目标表的相关记录加了lock in share mode mode保证数据一致性,此时,会堵塞客户端对这些记录的DML操做 四、LOW_PRIORITY插入,下降优先级插入,等表上无其余操做时才插 五、SHOW GLOBAL STATUS LIKE 'Threads_running'检查当前正在运行的Threads数量,默认Threads_running=25,若是未指定最大值,则会取当前值的120%做为最大值,若是超过阀值则会暂停数据拷贝 六、不少explain语句?缘由是没指定chunk大小,第一次默认分1000条记录,后面chunk具体多少根据执行计划判断成本,不影响系统正常运行则执行insert ----------------------------------------- RENAME TABLE `test`.`sbtest1` TO `test`.`_sbtest1_old`, `test`.`_sbtest1_new` TO `test`.`sbtest1` DROP TABLE IF EXISTS `test`.`_sbtest1_old` DROP TRIGGER IF EXISTS `test`.`pt_osc_test_sbtest1_del` DROP TRIGGER IF EXISTS `test`.`pt_osc_test_sbtest1_upd` DROP TRIGGER IF EXISTS `test`.`pt_osc_test_sbtest1_ins` SHOW TABLES FROM `test` LIKE '\_sbtest1\_new' 说明: 一、ANALYZE更新新表的统计信息 二、新老两张表重命名 三、删除原表 四、删除触发器
general log种已经算比较详细了,但不一样参数可能结果仍是会有很多区别,此处很少分析,精简一下osc的几个重要的步骤以下:
步骤 | 操做 |
---|---|
step1 | 检查原表是否由主键和触发器 |
step2 | 建立tmp-S数据表 |
step3 | 在S表上建立insert update delete触发器 |
step4 | 全量数据同步过去 |
step5 | 全量同步的过程当中,新产生的增量(变化)数据就触发到tmp-S表中 |
step6 | 新旧表重命名(元数据锁,短暂锁表) |
step7 | 删除旧表,删除触发器 |
重点提炼:
一、如何保证全量先过去仍是增量先过去?
insert、update两个触发器都是replace机制,解决了增量先于全量致使数据不对的问题
好比update一条记录,全量的还没进来,你update啥呢?是空数据,就不执行,但不要紧,我先把最新数据插进去,全量过来冲突了就ignore,此时无主键就会致使数据错误
ignore表示出错了,不会返回出错信息,直接忽略
update的ignore:增量数据先执行了,又接着导原来的数据进去可能会主键冲突
delete的ignore:导入过程当中,原表一条数据被删除,以后的导入过程当中已经没有这条记录了,那么新表中delete时找不到该记录
二、怎样保证limit 1000 数据一致?
数据一致是由lock in share mode保证
三、为何不直接先全量后增量?
很差控制,先建触发器,因此确定有增量比全量之先过去,若是触发器后建,那导入全量的时候产生的增量又没办法弄了
四、osc的一些限制 ①表上必定要有主键(操做主键有风险需注意) 一方面chunck分片时要用主键,另外一方面下降数据错误的风险 ②表上有外键怎么办? --alter-foreign-keys-metho=rebuild_constraints ③原表不能存在触发器 同一个表不能存在同一类型的触发器(5.7版本已经没有这个限制) ④只支持innodb,而且实例上空闲空间大于原表1倍以上