干货 | 基于KingbaseES的SQL性能调优（附案例）

随着互联网技术在商业应用中的普遍普及，数据库应用日益庞大复杂，随之数据库性能表现的重要性也愈加凸显。在数据库的众多性能问题中，SQL性能问题占有极大的比例，所以，SQL性能调优也就成为了数据库管理员进阶须要掌握的一项重要技能。凭借着金仓的多年来的海量应用数据，咱们总结出了一些SQL性能调优经验供读者使用。
web

常见问题分析算法

SQL性能问题通常表如今过长的响应时间，当时间长到应用没法忍受时，就会成为一个咱们须要解决的性能问题。结合众多现场的状况，能够得出如下常见问题缘由：数据库

一、缺乏SQL访问结构安全

缺乏索引、物化视图、分区之类的SQL访问结构是致使SQL性能欠佳的典型缘由。例如：微信

选择率较低的查询谓词在大表上使用全表扫描，而没有索引网络
排序、max/min等状况没有选用索引架构

TEST=# explain analyze select max(id) from t1;
oop

QUERY PLAN性能

-------------------------------------------------------------------------------------------------------------------flex

Aggregate (cost=17906.00..17906.01 rows=1 width=4) (actual time=180.486..180.486 rows=1 loops=1)

-> Seq Scan on T1 (cost=0.00..15406.00 rows=1000000 width=4) (actual time=0.014..89.855 rows=1000000 loops=1)

Planning time: 0.157 ms

Execution time: 180.531 ms

示例1-问题（解决方法见下文）

二、次优的执行计划

优化器在大部分时候都能给出较优的执行计划。可是，有时候，由于优化器自身的局限性，有可能选择不理想的计划。例如：

多表查询的链接顺序未能将能够过滤更多数据的两表链接最早执行
链接算法不够好
选择率的估算不许确

TEST=# explain analyze select * from t3,t1 where t3.c1 = t3.c2 and t3.c2 = t3.c3 and t1.id = t3.c1;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=0.42..20617.31 rows=25 width=23) (actual time=0.029..1676.549 rows=1000000 loops=1)

-> Seq Scan on T3 (cost=0.00..20406.00 rows=25 width=12) (actual time=0.019..171.783 rows=1000000 loops=1)

Filter: ((C1 = C2) AND (C2 = C3))

-> Index Scan using T1_ID_IDX on T1 (cost=0.42..8.44 rows=1 width=11) (actual time=0.001..0.001 rows=1 loops=1000000)

Index Cond: (ID = T3.C1)

Planning time: 0.819 ms

Execution time: 1705.336 ms

(7 rows)

示例2-问题（解决方法见下文）

三、陈旧的统计信息

当统计信息维护操做（自动或手动）没法跟上DML致使的表数据更改时，收集到的统计信息可能会过期。因为表上的陈旧统计信息没法准确反映表数据，所以优化器能够基于错误信息作出决策并生成次优执行计划。

TEST=# explain analyze select * from student where sno > 2;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on STUDENT (cost=60.12..156.12 rows=2560 width=40) (actual time=1.381..3.588 rows=9998 loops=1)

Recheck Cond: (SNO > 2)

Heap Blocks: exact=64

-> Bitmap Index Scan on IDX_STU (cost=0.00..59.48 rows=2560 width=0) (actual time=1.339..1.339 rows=9998 loops=1)

Index Cond: (SNO > 2)

Planning time: 0.096 ms

Execution time: 4.254 ms

(7 rows)

TEST=# analyze student;

ANALYZE

TEST=# explain analyze select * from student where sno > 2;

QUERY PLAN

----------------------------------------------------------------------------------------------------------

Seq Scan on STUDENT (cost=0.00..189.00 rows=9999 width=19) (actual time=0.015..2.362 rows=9998 loops=1)

Filter: (SNO > 2)

Rows Removed by Filter: 2

Planning time: 0.215 ms

Execution time: 3.050 ms

(5 rows)

示例3-问题

四、低效的SQL语句设计

若是SQL语句执行了一些没必要要的工做，那么优化器将没法作不少事情来提升其性能。低效设计的例子包括：

忽略添加链接条件，致使笛卡尔积链接
过滤条件没有下推
指定UNION而不是UNION ALL
使用OR而不是UNION

TEST=# explain analyze select * from or_union o,or_union1 o1 where o.b=o1.b and (o.a=1 or o1.a=10 );

QUERY PLAN

-----------------------------------------------------------------------------------------

Hash Join (cost=15417.00..37867.00 rows=50200 width=16) (actual time=354.603..1325.621 rows=50000 loops=1)

Hash Cond: (O.B = O1.B)

Join Filter: ((O.A = 1) OR (O1.A = 10))

Rows Removed by Join Filter: 450000

-> Seq Scan on OR_UNION O (cost=0.00..7213.00 rows=500000 width=8) (actual time=0.018..133.690 rows=500000 loops=1)

-> Hash (cost=7213.00..7213.00 rows=500000 width=8) (actual time=351.181..351.181 rows=500000 loops=1)

Buckets: 131072 Batches: 8 Memory Usage: 3465kB

-> Seq Scan on OR_UNION1 O1 (cost=0.00..7213.00 rows=500000 width=8) (actual time=0.018..135.804 rows=500000 loops=1)

Planning time: 0.587 ms

Execution time: 1330.277 ms

示例4-问题（解决方法见下文）

五、硬件或架构问题

受限于当前硬件环境（CPU、内存、IO、网络等）、架构，全部的优化手段都不足以达到预期性能，则须要作硬件或者架构扩展。

优化手段

为了应对SQL性能问题，KingbaseES在不断升级优化器的同时也提供了诸多优化手段，常规优化手段以下。

一、索引

索引是一种有序的存储结构，也是一项极为重要的SQL优化手段，能够提升数据检索的速度。经过在表中的一个或多个列上建立索引，不少SQL语句的执行效率能够获得极大的提升。kingbase提供的索引类型包括BRTEE、HASH、GIST、GIN、BRIN、BLOOM、TRGM。

TEST=# create index on t1(id);

CREATE INDEX

TEST=# explain analyze select max(id) from t1;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------

Result (cost=0.46..0.47 rows=1 width=4) (actual time=0.069..0.070 rows=1 loops=1)

InitPlan 1 (returns $0)

-> Limit (cost=0.42..0.46 rows=1 width=4) (actual time=0.064..0.065 rows=1 loops=1)

-> Index Only Scan Backward using T1_ID_IDX on T1 (cost=0.42..33889.43 rows=1000000 width=4) (actual time=0.063..0.063 rows=1 loops=1)

Index Cond: (ID IS NOT NULL)

Heap Fetches: 1

Planning time: 0.283 ms

Execution time: 0.104 ms

(8 rows)

示例1-解决方法（问题描述见上文）

二、使用HINT

查询优化器在尽可能提供最优的执行计划，可是由于种种缘由，优化器以及它所依赖的统计信息存在着一些局限性。因此，某些状况下，优化器提供的统计信息不是最优，须要人工干预。HINT则是为了解决该问题而提供的一项功能，能够帮助用户人为的控制执行计划。HINT目前可控制的内容包括：扫描类型、链接类型、链接顺序、JOIN返回行数、GUC参数等。

TEST=# /*+HashJoin(T1 T3)*/explain analyze select * from t3,t1 where t3.c1 = t3.c2 and t3.c2 = t3.c3 and t1.id = t3.c1;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------

Hash Join (cost=20406.31..39562.56 rows=25 width=23) (actual time=323.815..699.533 rows=1000000 loops=1)

Hash Cond: (T1.ID = T3.C1)

-> Seq Scan on T1 (cost=0.00..15406.00 rows=1000000 width=11) (actual time=0.016..67.093 rows=1000000 loops=1)

-> Hash (cost=20406.00..20406.00 rows=25 width=12) (actual time=323.720..323.720 rows=1000000 loops=1)

Buckets: 1048576 (originally 1024) Batches: 1 (originally 1) Memory Usage: 51161kB

-> Seq Scan on T3 (cost=0.00..20406.00 rows=25 width=12) (actual time=0.017..188.784 rows=1000000 loops=1)

Filter: ((C1 = C2) AND (C2 = C3))

Planning time: 0.399 ms

Execution time: 730.199 ms

(9 rows)

示例2-解决方法（问题描述见上文）

三、使用并行

KingbaseES能设计出利用多 CPU 让查询更快的查询计划。这种特性被称为并行查询。对于分析型语句，并行查询带来的速度提高是显著的。不少查询在使用并行查询时查询速度比以前快了超过两倍，有些查询是之前的四倍甚至更多的倍数。那些访问大量数据但只返回其中少数行给用户的查询最能从并行查询中获益。

TEST=# explain analyze select avg(id) from t2;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------

Aggregate (cost=208335.00..208335.01 rows=1 width=32) (actual time=1572.051..1572.051 rows=1 loops=1)

-> Seq Scan on T2 (cost=0.00..183334.80 rows=10000080 width=4) (actual time=0.021..892.509 rows=10000000 loops=1)

Planning time: 0.132 ms

Execution time: 1572.108 ms

(4 rows)

TEST=# set max_parallel_workers_per_gather to 8;

SET

TEST=# explain analyze select avg(id) from t2;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------

Finalize Aggregate (cost=109334.72..109334.73 rows=1 width=32) (actual time=391.702..391.703 rows=1 loops=1)

-> Gather (cost=109334.20..109334.71 rows=5 width=32) (actual time=391.588..407.580 rows=6 loops=1)

Workers Planned: 5

Workers Launched: 5

-> Partial Aggregate (cost=108334.20..108334.21 rows=1 width=32) (actual time=383.329..383.330 rows=1 loops=6)

-> Parallel Seq Scan on T2 (cost=0.00..103334.16 rows=2000016 width=4) (actual time=0.028..250.937 rows=1666667 loops=6)

Planning time: 0.115 ms

Execution time: 407.654 ms

(8 rows)

示例3-解决方法

四、调整性能参数

KingbaseES容许对单个查询关闭特定的优化器特性。若是优化器为特定查询选择的执行计划并非最优的，能够经过设置这些参数强制优化器选择一个更好的执行计划来临时解决这个问题。常见修改参数包括：

节点开关参数，如enable_seqscan
内存参数，如work_mem

五、改写SQL语句

数据库的使用者在书写SQL语句的时候也常常会考虑到查询的性能，可是一个应用程序可能要写大量的SQL语句，并且有些SQL语句的逻辑极为复杂，数据库应用开发人员很难面面俱到地写出性能良好的语句。当须要改写SQL语句时能够考虑如下方式：

条件尽可能下推
子查询转为join
UNION转为UNION ALL
将OR改写为UNION

特别须要注意的是，SQL语句改写须要考虑语义等价的问题。

explain analyze select * from or_union o,or_union1 o1 where o.a=1 and o.b=o1.b

union

select * from or_union o,or_union1 o1 where o1.a=10 and o.b=o1.b

;

QUERY PLAN

-----------------------------------------------------------------------------------------

HashAggregate (cost=24099.91..24601.92 rows=50201 width=16) (actual time=435.391..462.814 rows=50000 loops=1)

Group Key: O.A, O.B, O1.A, O1.B

-> Append (cost=4409.47..23597.90 rows=50201 width=16) (actual time=63.081..387.360 rows=50000 loops=1)

-> Hash Join (cost=4409.47..13999.47 rows=50200 width=16) (actual time=63.081..380.380 rows=50000 loops=1)

Hash Cond: (O1.B = O.B)

-> Seq Scan on OR_UNION1 O1 (cost=0.00..7213.00 rows=500000 width=8) (actual time=0.022..107.463 rows=500000 loops=1)

-> Hash (cost=3781.97..3781.97 rows=50200 width=8) (actual time=62.946..62.946 rows=50000 loops=1)

Buckets: 65536 Batches: 1 Memory Usage: 2466kB

-> Bitmap Heap Scan on OR_UNION O (cost=941.47..3781.97 rows=50200 width=8) (actual time=13.723..41.791 rows=50000 loops=1)

Recheck Cond: (A = 1)

Heap Blocks: exact=2213

-> Bitmap Index Scan on OR_UNION_A_IDX (cost=0.00..928.92 rows=50200 width=0) (actual time=13.097..13.097 rows=50000 loops=1)

Index Cond: (A = 1)

-> Hash Join (cost=8.41..9096.42 rows=1 width=16) (actual time=0.169..0.169 rows=0 loops=1)

Hash Cond: (O_1.B = O1_1.B)

-> Seq Scan on OR_UNION O_1 (cost=0.00..7213.00 rows=500000 width=8) (actual time=0.024..0.024 rows=1 loops=1)

-> Hash (cost=8.40..8.40 rows=1 width=8) (actual time=0.106..0.106 rows=0 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 8kB

-> Index Scan using OR_UNION1_A_IDX on OR_UNION1 O1_1 (cost=0.42..8.40 rows=1 width=8) (actual time=0.106..0.106 rows=0 loops=1)

Index Cond: (A = 10)

Planning time: 0.442 ms

Execution time: 469.056 ms

示例4-解决方法（问题描述见上文）

以上总结基于金仓核心产品KES在20余个关键行业的关键核心系统10多年服务经验及海量应用数据。做为国产数据库领军企业，人大金仓始终坚持技术创新、产品升级、生态融合，秉承“更可靠、更安全、更智能、更融合”的发展理念，不断追求卓越产品性能与超一流服务体验，始终把好数据安全关口，积极夯实数据要素市场基石，助力国家新基建建设。

往期推荐

01	●战略签约，人大金仓携手同联公司签署战略协议共同开创信息化创新发展新篇章！
	► 点击阅读

02	●共建产业生态，神州数码数据库专题培训
	► 点击阅读

03	●人大金仓成功中标中国移动自主可控OLTP数据库联合创新项目
	► 点击阅读

你的好友金小仓拍了拍你

并请你帮ta“一键三连”

本文分享自微信公众号 - 金仓数据库（china-data）。
若有侵权，请联系 support@oschina.cn 删除。
本文参与“OSC源创计划”，欢迎正在阅读的你也加入，一块儿分享。