第17期：索引设计（主键设计）

时间 2020-12-02

标签 mysql sql 数据库数据库设计函数性能优化编码 spa 栏目 MySQL 繁體版

原文原文链接

表的主键指的针对一张表中的一列或者多列，其结果必须能标识表中每行记录的惟一性。InnoDB 表是索引组织表，主键既是数据也是索引。mysql

主键的设计原则

对空间占用要小

上一篇咱们介绍过 InnoDB 主键的存储方式，主键占用空间越小，每一个索引页里存放的键值越多，这样一次性放入内存的数据也就越多。sql

最好是有必定的排序属性

如 INT32 类型来作主键，数值有严格的排序，那新记录的插入只要往原先数据页后面添加新记录或者在数据页后新增空页来填充记录便可，这样有严格排序的主键写入速度也会很是快。数据库

数据类型为整形

数据类型早就已经讲过，按照前两点的需求，最理想的固然是选择整数类型，好比 int32 unsigned。数据顺序增加，要么是数据库本身生成，要么是业务自动生成。数据库设计

1、与业务无关的属性作主键

1.1 自增字段作主键

这是 MySQL 最推荐的方式。通常用 INT32 能够知足大部分场景，单库单表能够最大保存 42 亿行记录；含有自增字段的新增记录会顺序添加到当前索引节点的后续位置直到数据页写满为止，再写新页。这样会极大程度的减小数据页的随机 IO。
用自增字段作主键可能须要注意两个问题：
第一个问题：MySQL 原生自增键拆分
若是随着数据后期增加，有拆库拆表预期，能够考虑用 INT64；MySQL 原生支持拆库拆表的自增主键，经过自增步长与起始值来肯定。最少要有 2 个 MySQL 节点，每一个节点自增步长为 2，假设 server_id 分别为 1，2，那自增起始值也能够是 1，2。假设下面是第 1 个 MySQL 节点，设置好了步长和起始值后，表 tmp 插入三行，每行严格按照设置的方式插入数据。函数

mysql> set @@auto_increment_increment=2;
Query OK, 0 rows affected (0.00 sec)

mysql> set @@auto_increment_offset=1;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into tmp values(null),(null),(null);
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select * from tmp;
+----+
| id |
+----+
|  1 |
|  3 |
|  5 |
+----+
3 rows in set (0.00 sec)

可是这块 MySQL 并不能保证其余的值不冲突，好比插入一条节点 2 的值，也能成功插入，MySQL 默认对这块没有什么约束，最好是数据入库前就校验好。性能

mysql> insert into tmp values(2);
Query OK, 1 row affected (0.02 sec)

mysql> select * from tmp;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
|  5 |
+----+
4 rows in set (0.00 sec)

第二个问题：MySQL 自增键合并
这个问题通常牵扯到老的系统改造升级，好比多个分部老系统数据要向新系统合并，那以前每一个分部的自增主键不能简单的合并，可能会有主键冲突。举个例子，假设武汉市每一个区都有本身的医保数据，而且之前每一个区都是本身独立设计的数据库，如今医保要升级为全市统一，以市为单位设计新的数据库模型。
武昌的数据以下，对应表 n1，优化

mysql> select  * from n1;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
+----+
3 rows in set (0.00 sec)

汉阳的数据以下，对应表 n2，ui

mysql> select * from n2;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
+----+
3 rows in set (0.00 sec)

因为以前两个区数据库设计的人都没有考虑之后合并的事情，因此每一个区的表都有本身独立的自增主键，
考虑这样创建一张汇总表 n3，有新的自增 ID，而且设计导入老系统的 ID。编码

mysql> create table n3 (id int auto_increment primary key, old_id int);
Query OK, 0 rows affected (0.07 sec)
mysql> insert into n3 (old_id) select * from n1 union all select * from n2;
Query OK, 6 rows affected (0.01 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> select * from n3;
+----+--------+
| id | old_id |
+----+--------+
|  1 |      1 |
|  2 |      2 |
|  3 |      3 |
|  4 |      1 |
|  5 |      2 |
|  6 |      3 |
+----+--------+
6 rows in set (0.00 sec)

这样进行汇总，应用代码可能不太肯定怎么链接老的数据，这张表缺乏一个 old_id 到原始表名的映射。
那基于原始表 ID 与原始表名的映射关系创建一个多值索引。好比如下例子：spa

mysql> create table n4(old_id int, old_name varchar(64),primary key(old_id,old_name));
Query OK, 0 rows affected (0.05 sec)

mysql> insert into n4 select id ,'n1' from n1 union all select id,'n2' from n2;
Query OK, 6 rows affected (0.02 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> select * from n4;
+--------+----------+
| old_id | old_name |
+--------+----------+
|      1 | n1       |
|      1 | n2       |
|      2 | n1       |
|      2 | n2       |
|      3 | n1       |
|      3 | n2       |
+--------+----------+
6 rows in set (0.00 sec)

最终表结构，结合前面两张表 n3 和 n4，创建一个包含新的自增字段主键，原来表 ID，原来表名的新表：

create table n5(
id int unsigned auto_increment primary key,
old_id int,
old_name varchar(64),
unique key udx_old_id_old_name (old_id,old_name)
);

固然，关于数据汇总迁移的话题，讨论篇幅太长，不在本节范围。

1.2 UUID 作主键

UUID 和自增主键同样，能保证主键的惟一性。可是天生无序、随机产生、占用空间大。在 MySQL 里，用 char(36) 来存储 UUID，没有专门的 UUID 数据类型，相似这样的字符串: ‘7985847c-7d59-11ea-8add-080027c52750’。因为 InnoDB 表的特性，应该避免用 char(36) 保存原始 UUID 的方式作表主键。
虽然 UUID 无序，且存在空间浪费，但天生随机这个优势可否利用上？
MySQL 提供了如下的优化方法来让原始 UUID 能够被用于表主键：
函数 uuid_to_bin
MySQL 提供了函数 uuid_to_bin，把 UUID 字符串变为 16 个字节的二进制串。相似于某些数据库（好比 POSTGRESQL）的 UUID 类型。函数 uuid_to_bin 返回数据类型为 varbinary(16)。
例如表 t_binary，

mysql> create table t_binary(id varbinary(16) primary key,r1 int, key idx_r1(r1));
Query OK, 0 rows affected (0.07 sec)

mysql> insert into t_binary values (uuid_to_bin(uuid()),1),(uuid_to_bin(uuid()),2);
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select * from t_binary;
+------------------------------------+------+
| id                                 | r1   |
+------------------------------------+------+
| 0x412234A77DEF11EA9AF9080027C52750 |    1 |
| 0x412236E27DEF11EA9AF9080027C52750 |    2 |
+------------------------------------+------+
2 rows in set (0.00 sec)

函数 uuid_short
varbinary(16) 依然是无序的，为此 MySQL 还提供了一个函数 uuid_short，用来生成相似 UUID 的全局 ID，结果为 INT64。具体计算方式以下：
(server_id & 255) << 56 + (server_startup_time_in_seconds << 24) + incremented_variable++;

server_id & 255：占 1 个字节；
server_startup_time_in_seconds：占 4 个字节；
incremented_variable: 占 3 个字节。

若是知足如下条件，那这个值就一定是惟一的

server_id 惟一而且对函数 uuid_short() 的调用次数不超过每秒 16777216 次，也就是 2^24。因此通常状况下，uuid_short 函数能保证结果惟一。
uuid_short 函数生成的 ID 只需一个轻量级的 mutex 来保护，这点比自增 ID 须要的 auto-inc 表锁更省资源，生成结果确定更加快速。

下面表 t_uuid_short 演示了如何用这个函数。

mysql> create table t_uuid_short  (id bigint unsigned primary key,r1 int, key idx_r1(r1));
Query OK, 0 rows affected (0.06 sec)

mysql> insert into t_uuid_short values(uuid_short(),1),(uuid_short(),2)
Query OK, 2 rows affected (0.02 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select * from t_uuid_short;
+----------------------+------+
| id                   | r1   |
+----------------------+------+
| 16743984358464946177 |    1 |
| 16743984358464946178 |    2 |
+----------------------+------+
2 rows in set (0.00 sec)

能够看到 uuid_short 生成的数据是基于 INT64 有序的，因此这块能够看作是自增 ID 的一个补充优化，若是每秒调用次数少于 16777216，推荐用 uuid_short，而非自增 ID。
说了那么多，仍是简单验证下上面的结论，作个小实验。
如下实验涉及到四张表：

新建 t_uuid: uuid 为主键
表 t_binary：varbinary(16) 为主键
表 t_uuid_short：bigint 为主键
新建表 t_id：自增 ID 为主键

正如以前的预期，写性能差别按从最差到最好排列依次为：t_uuid; t_binary；t_id；t_uuid_short。咱们来实验下是否和预期相符。
新增的两张表结构：

mysql> create table t_uuid(id char(36) primary key, r1 int, key idx_r1(r1));
Query OK, 0 rows affected (0.06 sec)

mysql> create table t_id (id bigint auto_increment primary key, r1 int, key idx_r1(r1));
Query OK, 0 rows affected (0.08 sec)

简单写了一个存储过程，分别给这些表造 30W 条记录。

DELIMITER $$

CREATE

  PROCEDURE `ytt`.`sp_insert_data`(
  f_tbname VARCHAR(64),
  f_number INT UNSIGNED
  )

    BEGIN
    DECLARE i INT UNSIGNED DEFAULT 0; 
    SET @@autocommit=0;
    IF f_tbname = 't_uuid' THEN
      SET @stmt = CONCAT('insert into t_uuid values (uuid(),ceil(rand()*100));');
   ELSEIF f_tbname = 't_binary' THEN
     SET @stmt = CONCAT('insert into t_binary values(uuid_to_bin(uuid()),ceil(rand()*100));');
    ELSEIF f_tbname = 't_uuid_short' THEN
     SET @stmt = CONCAT('insert into t_uuid_short values(uuid_short(),ceil(rand()*100));');
    ELSEIF f_tbname = 't_id' THEN
      SET @stmt = CONCAT('insert into t_id(r1) values(ceil(rand()*100));');
    END IF;
    
    WHILE i < f_number
    DO 
      PREPARE s1 FROM @stmt;
      EXECUTE s1;
      SET i = i + 1;
      IF MOD(i,50) = 0 THEN
       COMMIT;
      END IF;
    END WHILE;
    COMMIT;
    DROP PREPARE s1;
SET @@autocommit=1;
    END$$
    
 DELIMITER ;

接下来分别调用存储过程，结果和预期一致。t_uuid 时间最长，t_uuid_short 时间最短。

mysql> call sp_insert_data('t_uuid',300000);
Query OK, 0 rows affected (5 min 23.33 sec)

mysql> call sp_insert_data('t_binary',300000);
Query OK, 0 rows affected (4 min 48.92 sec)

mysql> call sp_insert_data('t_id',300000);
Query OK, 0 rows affected (3 min 40.38 sec)

mysql> call sp_insert_data('t_uuid_short',300000);
Query OK, 0 rows affected (3 min 9.94 sec)

2、与业务有关的属性作主键。

主键的设计要求可读性很强，相似学生学号（入学年份+所属系+所读专业），购物订单编码等。其实很是不建议主键用这样有实际意义的业务字段。能够新建一个自增主键或者 uuid_short() 函数字段，实际业务字段非主键设计，变为普通惟一索引。好比表 n5：

mysql> create table n5(
        id int unsigned auto_increment primary key, 
        userno int unsigned ,
        unique key udx_userno(userno)
        );
Query OK, 0 rows affected (0.08 sec)

用 userno（用户编码）来作主键，若是在业务端数据已经错误，好比可能因为老师缘由录入错误数据，或者是业务系统的 BUG 致使录入数据有误，那不只要对录入表的主键作更改（这但是聚簇索引），还要更改依赖这张表的全部子表，这实际上是一个很大的工程。可是若是有与业务不相关的主键，只须要更改业务字段（二级索引）就能够，不须要更改依赖这张表的子表。
关于 MySQL 主键的设计思路大体介绍到此，有问题欢迎留言，欢迎指正本篇任何不足之处。

关于 MySQL 的技术内容，大家还有什么想知道的吗？赶忙留言告诉小编吧！