十八般武艺玩转GaussDB(DWS)性能调优：Plan hint运用

时间 2021-05-27

标签 redis 数据库 oop 性能测试优化网站 .net orm 索引栏目系统性能繁體版

原文原文链接

前言redis

数据库的使用者在书写SQL语句时，会根据本身已知的状况尽力写出性能很高的SQL语句。可是当须要写大量SQL语句，且有些SQL语句的逻辑极为复杂时，数据库使用者就很难写出性能较高的SQL语句。数据库

而每一个数据库都有一个相似人的大脑的查询优化器模块，它接收来自语法分析模块传递过来的查询树，在这个查询树的基础上进行逻辑上的等价变换、物理执行路径的筛选，而且把选择出的最优的执行路径传递给数据库的执行器模块。查询优化器是提高查询效率很是重要的一个手段。oop

数据库查询优化器的分类详见博文性能

Plan hint的引入测试

因为优化器基于统计信息和估算模型生成计划，当估算出现误差时，计划可能出现问题，性能较差，使语句的执行变得奇慢无比。优化

一般，查询优化器的优化过程对数据库使用者是透明的。在上一篇博文中， Gauss DB(DWS)提供了可经过配置GUC参数的方式，全局的干预查询计划的路径生成。本次，将介绍另外一种能够人工干预计划生成的功能--plan hint。Hint是一种经过SQL语句中的注释传递给优化器的指令，优化器使用hint为语句选择执行计划。在测试或开发环境中，hint对于测试特定访问路径的性能很是有用。例如，您可能知道某些表优先进行链接，能够有效减小中间结果集大小，在这种状况下，可使用提示来指示优化器使用更好的执行计划。网站

Plan hint功能属于语句级的调控，仅对当前语句的当前层次生效，能够帮助咱们在调优的过程当中，针对特定的语句，经过plan hint进行人工干预，选择更高效的执行计划。.net

GaussDB(DWS)的Plan hint有如下种类:orm

Join顺序的hint：调整join顺序索引

Scan/Join方法的hint：指定或避免scan/join的方法

Stream方法的hint：指定或避免redistribute/broadcast

行数hint：对于给定结果集，指定行数，或对原有估算值进行计算调整

倾斜值hint：在倾斜优化时，指定须要倾斜处理的特殊值

下面分别对以上几种plan hint的功能及其在实际中的运用作一下介绍。在下面几节的介绍中，除倾斜值hint外，都以tpcds中的Q6做为示例。为了能明显看到hint在查询优化过程当中的做用，咱们将store_sales表的统计信息删除。原始语句和生成的初始计划以下。

示例语句：

explain performanceselect a.ca_state state, count(*) cnt

from customer_address a

,customer c

,store_sales s

,date_dim d

,item i

where a.ca_address_sk = c.c_current_addr_sk

and c.c_customer_sk = s.ss_customer_sk

and s.ss_sold_date_sk = d.d_date_sk

and s.ss_item_sk = i.i_item_sk

and d.d_month_seq =

(select distinct (d_month_seq)

from date_dim

where d_year = 2000

and d_moy = 2 )

and i.i_current_price > 1.2 *

(select avg(j.i_current_price)

from item j

where j.i_category = i.i_category)

group by a.ca_state

having count(*) >= 10

order by cnt

limit 100;

Plan hint的应用

Join 顺序的hint

语法：

格式1：

leading(table_list)

仅指定join顺序，不指定内外表顺序

格式2：

leading((table_list))

同时指定join顺序和内外表顺序，内外表顺序仅在最外层生效

说明：

table_list为要调整join顺序的表名列表，表之间使用空格分隔。能够包含当前层的任意个表（别名），或对于子查询提高的场景，也能够包含子查询的hint别名，同时任意表可使用括号指定优先级。

注意：

表只能用单个字符串表示，不能带schema。

表若是存在别名，须要优先使用别名来表示该表。

list中的表在当前层或提高的子查询中必须是惟一的。若是不惟一，须要使用不一样的别名进行区分。

同一个表只能在list里出现一次。

示例1：

对于示例中的计划，能够看出，17-22号算子时store_sales表和item表join后生成hash表，store_sales表的数据量很大，store_sales和item表join后未过滤掉任何数据，因此这两个表join并生成hash表的时间都比较长。根据对tpcds各表中数据分布的了解，咱们知道，store_sales表和date_dim进行join，能够过滤掉网站监控较多数据，因此，可使用hint来提示优化器优将store_sales表和date_dim表先进行join，store_sales做为外表，date_dim做为内表，减小中间结果集大小。语句改写以下：

explain performanceselect /+ leading((s d)) / a.ca_state state, count(*) cnt

from customer_address a

,customer c

,store_sales s

,date_dim d

,item i

where a.ca_address_sk = c.c_current_addr_sk

and c.c_customer_sk = s.ss_customer_sk

and s.ss_sold_date_sk = d.d_date_sk

and s.ss_item_sk = i.i_item_sk

and d.d_month_seq =

(select distinct (d_month_seq)

from date_dim

where d_year = 2000

and d_moy = 2 )

and i.i_current_price > 1.2 *

(select avg(j.i_current_price)

from item j

where j.i_category = i.i_category)

group by a.ca_state

having count(*) >= 10

order by cnt

limit 100;

经过调整join顺序，使得以后各join的中间结果集都大幅减小，执行时间由34268.322ms降为11095.046ms。

Scan/Join方法的hint

用于指示优化器使用那种scan方法或join方法。

语法：

Join方法的hint格式：

[no] nestloop|hashjoin|mergejoin(table_list)

Scan方法的hint格式：

[no] tablescan|indexscan|indexonlyscan(table [index])

说明：

no表示提示优化器不使用这种方法。

table表示hint指定的表，只能指定一个表，若是表存在别名应优先使用别名进行hint。

index表示使用indexscan或indexonlyscan的hint时，指定的索引名称，当前只能指定一个。

示例2-1：

示例1中获得的执行计划，因为store_sales表的行数估算不许，store_sales和date_dim采用了效率很差的nestloop方式进行join。如今经过本节的hint方法来指示优化器不使用nestloop方式进行join。

explain performanceselect /+ leading((s d)) no nestloop(s d) / a.ca_state state, count(*) cnt

from customer_address a

,customer c

,store_sales s

,date_dim d

,item i

where a.ca_address_sk = c.c_current_addr_sk

and c.c_customer_sk = s.ss_customer_sk

and s.ss_sold_date_sk = d.d_date_sk

and s.ss_item_sk = i.i_item_sk

and d.d_month_seq =

(select distinct (d_month_seq)

from date_dim

where d_year = 2000

and d_moy = 2 )

and i.i_current_price > 1.2 *

(select avg(j.i_current_price)

from item j

where j.i_category = i.i_category)

group by a.ca_state

having count(*) >= 10

order by cnt

limit 100;