MySQL count知多少

时间 2020-04-05

标签 mysql count 多少栏目 MySQL 繁體版

原文原文链接

统计一个表的数据量是常常遇到的需求，可是不一样的表设计及不一样的写法，统计性能差异会有较大的差别，下面就简单经过实验进行测试(你们测试的时候注意缓存的状况，不然影响测试结果）。mysql

一、准备工做

为了后续测试工做的进行，先准备几张用于测试的表及数据，为了使测试数据具备参考意义，建议测试表的数据量大一点，以避免查询时间过小，所以，能够继续使用以前经常使用的连续数生成大法，以下：sql

/* 建立连续数表 */
CREATE TABLE nums(id INT primary key);

/* 生成连续数的存储过程,优化事后的 */
DELIMITER $$
CREATE  PROCEDURE `sp_createNum`(cnt INT )
BEGIN
    DECLARE i INT  DEFAULT 1;
    TRUNCATE TABLE nums;
    INSERT INTO nums SELECT i;
    WHILE i < cnt DO
      BEGIN
        INSERT INTO nums SELECT id + i FROM nums WHERE id + i<=cnt;
        SET i = i*2;
      END;
    END WHILE;
END$$

DELIMITER ;

生成数据，本次准备生成1kw条记录数据库

/* 调用存储过程 */
mysql> call sp_createNum(10000000);
Query OK, 1611392 rows affected (32.07 sec)

若是逐条循环，那时间至关长，你们能够自行测试，参考连接效率提高16800倍的连续整数生成方法缓存

1.1 建立innodb表

生成3张表innodb表，以下：微信

nums_1表只有字符串主键字段函数

/*  生成只有一个字符串类型字段主键的表nums_1 */
mysql> create table  nums_1 (p1 varchar(32) primary key ) engine=innodb;
Query OK, 0 rows affected (0.01 sec)

/*  导入数据,将id经过md5函数转换为字符串 */
mysql> insert into  nums_1 select md5(id) from nums;
Query OK, 10000000 rows affected (1 min 12.63 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

nums_2表有5个字段，其中主键为字符串类型字段的p1，其余字段为整型的id，非空的c1,可为空的c2,可为空的c3。性能

其中c1,c2字段内容彻底一致，差异是字段约束不同（c1不可为空，c2可为空），c3与c1,c2的差异在于c1中aa开头的值在c3中为null,其余内容同样。测试

/* 建立表nums_2 */
mysql> create table nums_2(p1 varchar(32) primary key ,id int ,c1 varchar(10) not null, c2 varchar(10),c3 varchar(10)) engine=innodb;
Query OK, 0 rows affected (1.03 sec)

/*导入数据 */
mysql> insert into  nums_2(id,p1,c1,c2,c3) select id,md5(id),left(md5(id),10),left(md5(id),10),if(,left(md5(id),10) like 'aa%',null,,left(md5(id),10)) from nums;
Query OK, 10000000 rows affected (5 min 6.68 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

nums_3表的内容与nums_2彻底同样，区别在于主键字段不同，c3表为整型的id优化

/*  建立表nums_3 */
mysql> create table nums_3(p1 varchar(32) ,id int primary key  ,c1 varchar(10) not null, c2 varchar(10),c3 varchar(10)) engine=innodb;
Query OK, 0 rows affected (0.01 sec)

/* 由于内容彻底一致，直接从nums_2 中导入 */
mysql> insert into nums_3 select  * from nums_2;
Query OK, 10000000 rows affected (3 min 18.81 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

1.2 建立MyISAM引擎表

再建立一张MyISAM的表，表结构及内容均与nums_2也一致，只是引擎为MyISAM。spa

/* 建立MyISAM引擎的nums_4表*/
mysql> create table nums_4(p1 varchar(32) not null  primary key ,id int  ,c1 varchar(10) not null, c2 varchar(10),c3 varchar(10)) engine=MyISAM;
Query OK, 0 rows affected (0.00 sec)

/* 直接从nums_2表导入数据 */
mysql> insert into nums_4 select  * from nums_2;
Query OK, 10000000 rows affected (3 min 16.78 sec)
Records: 10000000  Duplicates: 0  Warnings: 0

二、查询一张表数据量的方法

查询一张表的数据量有以下几种：

查询大体数据量，能够查统计信息，2.1中会介绍具体方法

精确查找数据量，则能够经过count(主键字段），count(*), count(1) [这里的1能够替换为任意常量]

2.1 非精确查询

若是只是查一张表大体有多少数据，尤为是很大的表只是查询其表属于什么量级的（百万、千万仍是上亿条），能够直接查询统计信息，查询方式有以下几种：

查询索引信息，其中Cardinality 为大体数据量（查看主键PRIMARY行的值，若是为多列的复合主键，则查看最后一列的Cardinality 值）

mysql> show index from nums_2;
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| nums_2 |          0 | PRIMARY  |            1 | p1          | A         |     9936693 |     NULL | NULL   |      | BTREE      |         |               |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
1 row in set (0.00 sec)

查看表状态，其中Rows为大体数据量

mysql> show table status like  'nums_2';
+--------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
| Name   | Engine | Version | Row_format | Rows    | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time | Check_time | Collation       | Checksum | Create_options | Comment |
+--------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
| nums_2 | InnoDB |      10 | Dynamic    | 9936693 |            111 |  1105182720 |               0 |   2250178560 |   4194304 |           NULL | 2020-04-04 19:31:34 | NULL        | NULL       | utf8_general_ci |     NULL |                |         |
+--------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
1 row in set (0.00 sec)

直接查看STATISTICS或TABLES表，内容与查看索引信息或表状态相似，其中TABLE_ROWS的内容为大体的数据量

mysql> select   * from  information_schema.tables where table_schema='testdb' and table_name like  'nums_2';
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME         | UPDATE_TIME | CHECK_TIME | TABLE_COLLATION | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT |
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
| def           | testdb       | nums_2     | BASE TABLE | InnoDB |      10 | Dynamic    |    9936693 |            111 |  1105182720 |               0 |   2250178560 |   4194304 |           NULL | 2020-04-04 19:31:34 | NULL        | NULL       | utf8_general_ci |     NULL |                |               |
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
1 row in set (0.00 sec)

注意：

innodb引发的表经过以上3种方式都可查询对应表的大体数据量，且结果相同，由于均是取自相同的统计信息
MyISAM表的结果是精确值（表数据量，不包含其余字段）

mysql> select   * from  information_schema.tables where table_schema='testdb' and table_name like  'nums_4';
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+----------------+---------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | TABLE_TYPE | ENGINE | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME         | UPDATE_TIME         | CHECK_TIME          | TABLE_COLLATION | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT |
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+----------------+---------------+
| def           | testdb       | nums_4     | BASE TABLE | MyISAM |      10 | Dynamic    |   10000000 |             75 |   759686336 | 281474976710655 |    854995968 |         0 |           NULL | 2020-04-04 19:20:23 | 2020-04-04 19:21:45 | 2020-04-04 19:23:45 | utf8_general_ci |     NULL |                |               |
+---------------+--------------+------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+----------------+---------------+
1 row in set (0.00 sec)

2.2 精确查找

由于2.1中innodb的表查询的结果都是统计值，非准备值，实际工做中大多数状况下须要统计精确值，那么查询精确值的方法有以下几种，且全部引擎的表都适用。

count(主键）

mysql> select count(p1) from nums_2;
+-----------+
| count(p1) |
+-----------+
|  10000000 |
+-----------+
1 row in set (1.60 sec)

count(1)

其中的1能够是任意常量，例如 count(2),count('a‘）等

mysql> select count(1) from nums_2;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (1.45 sec)

count(*)

mysql> select count(*) from nums_2;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (1.52 sec)

三、 count的性能对比

对比 count(主键） count(1) count(*) count（非空字段） count（可为空字段）性能对比

3.1 MyISAM引擎表

3.1.1 查询整张表数据量

若是想精确查询一张MyISAM表的数据量，使用 count(主键） count(1) count(*) 效率均一致，直接查出准确结果，耗时几乎为0s

mysql> select count(p1) from nums_4;
+-----------+
| count(p1) |
+-----------+
|  10000000 |
+-----------+
1 row in set (0.00 sec)

mysql> select count(1) from nums_4;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from nums_4;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (0.00 sec)

执行计划也均一致，能够看出没有经过主键或其余索引扫描的方式统计

mysql> explain select count(*) from nums_4;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(p1) from nums_4;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> explain select count(1) from nums_4;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
1 row in set, 1 warning (0.00 sec)

小结：

MyISAM的方法查整表数据量效率状况为 count(主键）= count(1) = count(*)

3.1.2 查询部分数据

查询部分数据的时候则没法直接从统计信息获取，所以耗时状况大体以下：

mysql> select count(p1) from nums_4 where  p1 like 'aa%';
+-----------+
| count(p1) |
+-----------+
|     39208 |
+-----------+
1 row in set (0.14 sec)

mysql> select count(1) from nums_4 where  p1 like 'aa%';
+----------+
| count(1) |
+----------+
|    39208 |
+----------+
1 row in set (0.13 sec)

mysql> select count(*) from nums_4 where p1 like 'aa%';
+----------+
| count(*) |
+----------+
| 39208 |
+----------+
1 row in set (0.13 sec)

执行计划其实均同样：

mysql> explain select count(1) from nums_4 where  p1 like 'aa%';
+----+-------------+--------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------+
| id | select_type | table  | partitions | type  | possible_keys | key     | key_len | ref  | rows  | filtered | Extra                    |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------+
|  1 | SIMPLE      | nums_4 | NULL       | range | PRIMARY       | PRIMARY | 98      | NULL | 42603 |   100.00 | Using where; Using index |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

小结： MyISAM引擎表统计部分数据的时候直接得出数据量，也许扫描数据进行统计，几种写法效率相近。

3.2 innodb引擎表

innodb引擎由于要支持MVCC，所以不能整表数据量持久化保存，每次查询均需遍历统计，可是不一样的写法，查询效率是有差异的，后面将进行不一样维度进行对比。

3.2.1 不一样写法的性能对比

经过 count(主键),count(1) , count(*) 对比查询效率

mysql> select count(p1) from nums_2  ;
+-----------+
| count(p1) |
+-----------+
|  10000000 |
+-----------+
1 row in set (1.68 sec)

mysql> select count(1) from nums_2  ;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (1.37 sec)

mysql> select count(*) from nums_2  ;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (1.38 sec)

简单的对比发现，查询性能结果为 count(主键) < count(1) ≈ count(*)

可是查看执行计划都是以下状况

mysql> explain select count(p1) from nums_2;
+----+-------------+--------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
| id | select_type | table  | partitions | type  | possible_keys | key     | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | nums_2 | NULL       | index | NULL          | PRIMARY | 98      | NULL | 9936693 |   100.00 | Using index |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec

可是查询效率不同，缘由在于统计的方式不同，以下：

count(主键)：innodb引擎根据对应的索引遍历整张表，把每一行的主键值都取出来，返回给 server 层。server 层拿到主键字段后，判断是不为空的（此处其实能够优化），就按行累加。
count(1)：也是遍历整张表，由于每行的结果都是1（非空），因此能够直接计数，无需判断是否为空。
count(*): innodb引擎作了优化处理的，此种方式和count(1)相似，直接按行累计统计

3.2.2 主键字段类型不一样性能对比

nums_2与nums_3内容相同，区别在于num_3的主键字段是整型的id字段，如今对比主键字段不一样时查询性能的差异，

mysql> select /* SQL_NO_CACHE */count(1) from nums_2;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (2.02 sec)

mysql> select /* SQL_NO_CACHE */count(1) from nums_3;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (1.69 sec)

测试发现，相同内容数据的表表主键不一样，性能存在差别，且，查询时主键（索引）字段类型小的时候查询效率更好。

注：若是nums_2的id字段上添加索引后，会发现查询会走id的索引，缘由在于主键索引（汇集索引）的类型是varchar(32),而id是int,索引的大小不同，走整型的索引IO开销会少。

所以，建议MySQL的主键使用自增id做为主键（优点不只在数据统计上，有机会在讲解）。

3.2.3 表大小不一样的对比

准备工做中的nums_1 与nums_3差异在于主键都是整型的id 可是nums_3的字段更多，也就是说表更大，查询效率对好比下：

mysql> select /* SQL_NO_CACHE */count(1) from nums_1;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (1.61 sec)

mysql> select /* SQL_NO_CACHE */count(1) from nums_3;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (1.67 sec)

查询时间仅供参考，取决于机器性能。

因而可知表大小不一样，查询效率也不一样，表越小查询效率越高。

3.2.4 count(普通字段）

由于nums_3表的c2字段容许为空，可是内容均不为空，c3字段容许为空，可是存在内容为空的状况。如今将nums_3表的c2,c3字段分别统计，查看结果（先添加索引，提升查询性能）

mysql> select  count(c2) from  nums_3 ;
+-----------+
| count(c2) |
+-----------+
|  10000000 |
+-----------+
1 row in set (1.69 sec)

mysql> select  count(c3) from  nums_3 ;
+-----------+
| count(c3) |
+-----------+
|   9960792 |
+-----------+
1 row in set (1.73 sec)

由于c3字段有存在null的值，索引统计c3行数的时候会忽略null值的行。

四、总结

以上经过对比MyISAM引擎及InnoDB引擎表经过不一样写法的统计效率进行对比，能够获得以下结论：

MyISAM表统计整表行数能够直接取出，效率最高，可是MyISAM表不支持事务
InnoDB表统计效率 count(主键) < count(1) ≈ count(*)
MySQL建议设置自增字段类型的主键
表大小越小，查询统计效率越高

其实经过准备工做中的的几张表还能够作更多的测试，感兴趣的同窗能够自行测试（啰嗦一句，注意缓存，哈哈），也可关注微信公众号【数据库干货铺】进入技术交流群及时沟通，谢谢。