首先是表mysql
CREATE TABLE `page_test` ( `id` INT(11) NOT NULL AUTO_INCREMENT, `name` VARCHAR(20) NOT NULL, `email` VARCHAR(40) NOT NULL, `solved_number` INT(11) NOT NULL, PRIMARY KEY (`id`) ) COLLATE='utf8_general_ci' ENGINE=InnoDB AUTO_INCREMENT=1 ;
搞1000086个数c++
delimiter $$ create procedure pre() BEGIN declare i INT; SET i = 1; while i < 1000086 DO INSERT INTO page_test VALUES (i,substring(MD5(RAND()),1,20),substring(MD5(RAND()),1,20),i); SET i = i+1; END while; END; $$ CALL pre();
win10+i3(存储介质:eMLC寨盘)插入速度1.8MB/ssql
改用sql直接导入性能
首先要调整大小,否则会gone away优化
set global max_allowed_packet=1068435456;
(一个不许确但够大的数)spa
C++生成3d
#include<bits/stdc++.h> using namespace std; const int MAXN = 5e6+11; char rnd[13]; char rnd2[13]; int main() { freopen("insert.txt","w",stdout); int cur = 0; printf("INSERT INTO training.page_test\nVALUES"); while(cur++ < MAXN) { for(int i = 0; i < 12; i++) { rnd[i] = (rand()%26)+'a'; rnd2[i] = (rand()%10)+'0'; } printf("(%d,'%s','%s',%d)",cur,rnd,rnd2,cur); if(cur < MAXN) printf(",\n"); } return 0; }
mysql -uroot -p123456 < D:\Code\cpp\insert.txt
IO大概在60-130MB/s(内存占了2-3G)code
SELECT COUNT(*) FROM page_test; /* Affected rows: 0 Found rows: 1 Warnings: 0 Duration for 1 query: 1.016 sec. */ SELECT * FROM page_test LIMIT 5000002,1; /* Affected rows: 0 Found rows: 1 Warnings: 0 Duration for 1 query: 3.031 sec. */ SELECT * FROM page_test LIMIT 5000002,3; /* Affected rows: 0 Found rows: 3 Warnings: 0 Duration for 1 query: 3.110 sec. */
走索引blog
SELECT * FROM page_test WHERE id = ( SELECT id FROM page_test LIMIT 5000003,1 ); /* Affected rows: 0 Found rows: 1 Warnings: 0 Duration for 1 query: 2.219 sec. */
多个id排序
SELECT a.* FROM page_test a JOIN (SELECT id FROM page_test LIMIT 5000001,5) b ON a.id = b.id; /* Affected rows: 0 Found rows: 5 Warnings: 0 Duration for 1 query: 2.219 sec. */
若是已知id必然在某个范围能够这样
SELECT * FROM page_test a WHERE a.id >= 5000002 AND a.id <= 5000006; /* Affected rows: 0 Found rows: 5 Warnings: 0 Duration for 1 query: 0.000 sec. */
EXPALIN分析
索引下的LIMIT
EXPLAIN SELECT a.id FROM page_test a LIMIT 5000001,5;
大概须要2.2s
非索引下的LIMIT
EXPLAIN SELECT a.email FROM page_test a LIMIT 5000001,5;
差很少时间2.3s,虽然ALL看着比index要差点,但实际跑起来没差
事实上有无索引在Limit下跑起来速度几乎同样,我的推测与Innodb的索引文件和数据文件合并有关
也就是说,LIMIT的噩梦靠索引救不了
至于前面的先拿出索引id再join的作法,explain相对没那么难看且可灵活应对修改,但事实上跑起来也。。。没差拉
真正的索引与虚伪的索引
EXPLAIN SELECT * FROM page_test a WHERE a.id >= 5000002 AND a.id <= 5000006;
只需5行,0s
因此若是没有删除的需求且保证ID连续的话,用WHERE来代替分页是最好的选择
(各大OJ都使用VOL来表示分页,看来不是没有道理)
后记
1.尝试开启query_cache,发现灵活性不够好,算了
2.看到有人用索引+order by来优化后面的limit查询,感受有点意思
试了一下
SELECT * FROM page_test ORDER BY id DESC LIMIT 5;
时间不用看了,秒出结果(DESC不须要真正的排序 (Backward index scan))
加上这种优化能够保证最坏状况只在n/2出现,很是实用(大多数人都是要么看前面要么看后面)
固然order by要保证索引
3.用派生的表/列来维护页数
要是愿意用O(n)的时间来维护每一次增删,却是个不错的方法
可是这样表的数量直接X2
而且对于百/千万级以上的数据,O(n)须要花费秒级的代价来维护
4.子查询在个人mySQL版本不容许使用in (select....),我感受即便能用也差很少
5.理论上limit m,n能够经过B+树上维护子树大小来logm查找,可能考虑到SQL的复杂性,事实上根本没有这种操做
6.考虑需求
要是确实要删除,直接屏蔽访问而不完全删除也是一种策略,这样依然能够保证ID连续意义下分页查询的高性能
保证上面前提下的修改,某种状况下能够用交换来代替实现(ID惟一保证),这样分页依然能够快速查询