返回ProxySQL系列文章:http://www.cnblogs.com/f-ck-need-u/p/7586194.htmlhtml
ProxySQL在收到前端发送来的SQL语句后,能够根据已定制的规则去匹配它,匹配到了还能够去重写这个语句,而后再路由到后端去。前端
何时须要重写SQL语句?mysql
对于下面这种简单的读、写分离,固然用不上重写SQL语句。正则表达式
这样的读写分离,实现起来很是简单。以下:sql
mysql_replication_hostgroups: +------------------+------------------+----------+ | writer_hostgroup | reader_hostgroup | comment | +------------------+------------------+----------+ | 10 | 20 | cluster1 | +------------------+------------------+----------+ mysql_servers: +--------------+----------+------+--------+--------+ | hostgroup_id | hostname | port | status | weight | +--------------+----------+------+--------+--------+ | 10 | master | 3306 | ONLINE | 1 | | 20 | slave1 | 3306 | ONLINE | 1 | | 20 | slave2 | 3306 | ONLINE | 1 | +--------------+----------+------+--------+--------+ mysql_query_rules: +---------+-----------------------+----------------------+ | rule_id | destination_hostgroup | match_digest | +---------+-----------------------+----------------------+ | 1 | 10 | ^SELECT.*FOR UPDATE$ | | 2 | 20 | ^SELECT | +---------+-----------------------+----------------------+
可是,复杂一点的,例如ProxySQL实现sharding功能。对db1库的select_1语句路由给hg=10的组,将db2库的select_2语句路由给hg=20的组,将db3库的select_3语句路由给hg=30的组。数据库
在ProxySQL实现sharding时,基本上都须要将SQL语句进行重写。这里用一个简单的例子来讲明分库是如何进行的。后端
假如,计算机学院it_db占用一个数据库,里面有一张学生表stu,stu表中有表明专业的字段zhuanye(例子只是随便举的,请无视合理性)。app
it_db库: stu表 +---------+----------+---------+ | stu_id | stu_name | zhuanye | +---------+----------+---------+ | 1-99 | ... | Linux | +---------+----------+---------+ | 100-150 | ... | MySQL | +---------+----------+---------+ | 151-250 | ... | JAVA | +---------+----------+---------+ | 251-550 | ... | Python | +---------+----------+---------+
分库时,能够为各个专业建立库。因而,建立4个库,每一个库中仍保留stu表,但只保留和库名对应的学生数据:less
Linux库:stu表 +---------+----------+---------+ | stu_id | stu_name | zhuanye | +---------+----------+---------+ | 1-99 | ... | Linux | +---------+----------+---------+ MySQL库:stu表 +---------+----------+---------+ | stu_id | stu_name | zhuanye | +---------+----------+---------+ | 100-150 | ... | MySQL | +---------+----------+---------+ JAVA库:stu表 +---------+----------+---------+ | stu_id | stu_name | zhuanye | +---------+----------+---------+ | 151-250 | ... | JAVA | +---------+----------+---------+ Python库:stu表 +---------+----------+---------+ | stu_id | stu_name | zhuanye | +---------+----------+---------+ | 251-550 | ... | Python | +---------+----------+---------+
因而,原来查询MySQL专业学生的SQL语句:测试
select * from it_db.stu where zhuanye='MySQL' and xxx;
分库后,该SQL语句须要重写为:
select * from MySQL.stu where 1=1 and xxx;
至于如何达到上述目标,本文结尾给出了一个参考规则。
sharding而重写只是一种状况,在不少使用复杂ProxySQL路由规则时可能都须要重写SQL语句。下面将简单介绍ProxySQL的语句重写,为后文作个铺垫,在以后介绍ProxySQL + sharding的文章中有更多具体的用法。
在mysql_query_rules表中有match_pattern字段和replace_pattern字段,前者是匹配SQL语句的正则表达式,后者是匹配成功后(命中规则),将原SQL语句改写,改写后再路由给后端。
须要注意几点:
本文的替换规则出于入门的目的,很简单,只需掌握最基本的正则知识便可。但想要灵活运用,须要掌握PCRE的正则,若是您已有正则的基础,可参考个人一篇总结性文章:pcre和正则表达式的误点。
例如,将下面的语句1重写为语句2。
select * from test1.t1; select * from test1.t2;
插入以下规则:
delete from mysql_query_rules; select * from stats_mysql_query_digest_reset where 1=0; insert into mysql_query_rules(rule_id,active,match_pattern,replace_pattern,destination_hostgroup,apply) values (1,1,"^(select.*from )test1.t1(.*)","\1test1.t2\2",20,1); load mysql query rules to runtime; save mysql query rules to disk; select rule_id,destination_hostgroup,match_pattern,replace_pattern from mysql_query_rules; +---------+-----------------------+------------------------------+-----------------+ | rule_id | destination_hostgroup | match_pattern | replace_pattern | +---------+-----------------------+------------------------------+-----------------+ | 1 | 20 | ^(select.*from )test1.t1(.*) | \1test1.t2\2 | +---------+-----------------------+------------------------------+-----------------+
而后执行:
$ proc="mysql -uroot -pP@ssword1! -h127.0.0.1 -P6033 -e" $ $proc "select * from test1.t1;" +------------------+ | name | +------------------+ | test1_t2_malong1 | | test1_t2_malong2 | | test1_t2_malong3 | +------------------+
可见语句成功重写。
再看看规则的状态。
Admin> select rule_id,hits from stats_mysql_query_rules; +---------+------+ | rule_id | hits | +---------+------+ | 1 | 1 | | 2 | 0 | +---------+------+ Admin> select hostgroup,count_star,digest_text from stats_mysql_query_digest; +-----------+------------+------------------------+ | hostgroup | count_star | digest_text | +-----------+------------+------------------------+ | 20 | 1 | select * from test1.t2 | <--已替换 +-----------+------------+------------------------+
更简单的,还能够直接替换单词。例如:
delete from mysql_query_rules; select * from stats_mysql_query_digest_reset where 1=0; insert into mysql_query_rules(rule_id,active,match_pattern,replace_pattern,destination_hostgroup,apply) values (1,1,"test1.t1","test1.t2",20,1); load mysql query rules to runtime; save mysql query rules to disk; select rule_id,destination_hostgroup,match_pattern,replace_pattern from mysql_query_rules; +---------+-----------------------+---------------+-----------------+ | rule_id | destination_hostgroup | match_pattern | replace_pattern | +---------+-----------------------+---------------+-----------------+ | 1 | 20 | test1.t1 | test1.t2 | +---------+-----------------------+---------------+-----------------+
以本文前面sharding示例中的语句为例,简单演示下sharding时的分库语句怎么改写。更完整的sharding实现方法,见后面的文章。
#原来查询MySQL专业学生的SQL语句: select * from it_db.stu where zhuanye='MySQL' and xxx; | | | \|/ #改写为查询分库MySQL的SQL语句: select * from MySQL.stu where 1=1 and xxx;
如下是完整语句:关于这个规则中的正则部分,稍后会解释。
delete from mysql_query_rules; select * from stats_mysql_query_digest_reset where 1=0; insert into mysql_query_rules(rule_id,active,apply,destination_hostgroup,match_pattern,replace_pattern) values (1,1,1,20,"^(select.*?from) it_db\.(.*?) where zhuanye=['""](.*?)['""] (.*)$","\1 \3.\2 where 1=1 \4"); load mysql query rules to runtime; save mysql query rules to disk; select rule_id,destination_hostgroup dest_hg,match_pattern,replace_pattern from mysql_query_rules; +---------+---------+-----------------------------------------------------------------+-----------------------+ | rule_id | dest_hg | match_pattern | replace_pattern | +---------+---------+-----------------------------------------------------------------+-----------------------+ | 1 | 20 | ^(select.*?from) it_db\.(.*?) where zhuanye=['"](.*?)['"] (.*)$ | \1 \3.\2 where 1=1 \4 | +---------+---------+-----------------------------------------------------------------+-----------------------+
而后执行分库查询语句:
proc="mysql -uroot -pP@ssword1! -h127.0.0.1 -P6033 -e" $proc "select * from it_db.stu where zhuanye='MySQL' and 1=1;"
看看是否命中规则,并成功改写SQL语句:
Admin> select rule_id,hits from stats_mysql_query_rules; +---------+------+ | rule_id | hits | +---------+------+ | 1 | 1 | +---------+------+ Admin> select hostgroup,count_star,digest_text from stats_mysql_query_digest; +-----------+------------+-------------------------------------------+ | hostgroup | count_star | digest_text | +-----------+------------+-------------------------------------------+ | 20 | 1 | select * from MySQL.stu where ?=? and ?=? | | 10 | 1 | select @@version_comment limit ? | +-----------+------------+-------------------------------------------+
解释下前面的规则:
match_pattern:
- "^(select.*?from) it_db\.(.*?) where zhuanye=['""](.*?)['""] (.*)$"
replace_pattern:
- "\1 \3.\2 where 1=1 \4"
^(select.*?from)
:表示不贪婪匹配到from字符。之因此不贪婪匹配,是为了不子查询或join子句出现多个from的状况。
it_db\.(.*?)
:这里的it_db是稍后要替换掉为"MySQL"字符的部分,而it_db后面的表稍后要附加在"MySQL"字符后,因此对其分组捕获。
zhuanye=['""](.*?)['""]
:
- 这里的zhuanye字段稍后是要删除的,但后面的字段值"MySQL"须要保留做为稍后的分库,所以对字段值分组捕获。同时,字段值先后的引号多是单引号、双引号,因此两种状况都要考虑到。
- ['""]
:要把引号保留下来,须要对额外的引号进行转义:双引号转义后成单个双引号。因此,真正插入到表中的结果是['"]
。
- 这里的语句并不健壮,由于若是是zhuanye='MySQL"
这样单双引号混用也能被匹配。若是要避免这种问题,须要使用PCRE的反向引用。例如,改写为:zhuanye=(['""])(.*?)\g[N]
,这里的[N]
要替换为(['""])
对应的分组号码,例如\g3
。
(.*)$
:匹配到结束。由于这里的测试语句简单,没有join和子查询什么的,因此直接匹配。
"\1 \3.\2 where 1=1 \4"
:这里加了1=1
,是为了防止出现and/or等运算符时前面缺乏表达式。例如(.*)$
捕获到的内容为and xxx=1
,不加上1=1的话,将替换为where and xxx=1
,这是错误的语句,因此1=1是个占位表达式。
可见,要想实现一些复杂的匹配目标,正则表达式是很是繁琐的。因此,颇有必要去掌握PCRE正则表达式。