PostgreSQL中的partition-wise join

时间 2019-11-06

标签 postgresql partition wise join 栏目 Postgre SQL 繁體版

原文原文链接

与基于继承的分区(inheritance-based partitioning)不一样，PostgreSQL 10中引入的声明式分区对数据如何划分没有任何影响。PostgreSQL 11的查询优化器正准备利用这种“无推理”表示。第一个提交的是partition-wise join。

什么是partition-wise join

若是链接表的分区键之间存在相等链接条件，那么两个相似分区表之间的链接能够分解为它们的匹配分区之间的链接。分区键之间的等链接意味着一个分区表的给定分区中给定行的全部链接伙伴必须在另外一个分区表的相应分区中。所以，分区表之间的链接能够分解为匹配分区之间的链接。这种将分区表之间的链接分解为分区之间的链接的技术称为partition-wise join。sql

PostgreSQL中的partition-wise join服务器

让咱们从一个例子开始。考虑按以下方式分区的两个表:app

create table prt1 (a int, b int, c varchar) partition by range(a);
create table prt1_p1 partition of prt1 for values from (0) to (5000);
create table prt1_p2 partition of prt1 for values from (5000) to (15000);
create table prt1_p3 partition of prt1 for values from (15000) to (30000);

create table prt2 (a int, b int, c varchar) partition by range(b);
create table prt2_p1 partition of prt2 for values from (0) to (5000);
create table prt2_p2 partition of prt2 for values from (5000) to (15000);
create table prt2_p3 partition of prt2 for values from (15000) to (30000);

prt1_p1中一行的全部链接伙伴都来自prt2_p1。
prt1_p2中一行的全部链接伙伴都来自prt2_p2。
而prt1_p3中一行的全部链接伙伴都来自prt2_p3。
这三个组成了匹配的分区对。没有partition-wise join，这两个表之间的链接计划以下:oop

explain (costs off)
select * from prt1 t1, prt2 t2 where t1.a = t2.b and t1.b = 0 and t2.b between 0 and 10000;
                       QUERY PLAN             
-------------------------------------------------------
 Hash Join
   Hash Cond: (t2.b = t1.a)
   ->  Append
         ->  Seq Scan on prt2_p1 t2
               Filter: ((b >= 0) AND (b <= 10000))
         ->  Index Scan using prt2_p2_b on prt2_p2 t2_1
               Index Cond: ((b >= 0) AND (b <= 10000))
   ->  Hash
         ->  Append
               ->  Seq Scan on prt1_p1 t1
                     Filter: (b = 0)
               ->  Seq Scan on prt1_p2 t1_1
                     Filter: (b = 0)
               ->  Seq Scan on prt1_p3 t1_2
                     Filter: (b = 0)
(15 rows)

partition-wise join的加入计划为相同的查询以下:优化

explain (costs off)
select * from prt1 t1, prt2 t2 where t1.a = t2.b and t1.b = 0 and t2.b between 0 and 10000;
                               QUERY PLAN                            

------------------------------------------------------------------------
 Append
   ->  Hash Join
         Hash Cond: (t2.b = t1.a)
         ->  Seq Scan on prt2_p1 t2
               Filter: ((b >= 0) AND (b <= 10000))
         ->  Hash
               ->  Seq Scan on prt1_p1 t1
                     Filter: (b = 0)
   ->  Nested Loop
         ->  Seq Scan on prt1_p2 t1_1
               Filter: (b = 0)
         ->  Index Scan using prt2_p2_b on prt2_p2 t2_1
               Index Cond: ((b = t1_1.a) AND (b >= 0) AND (b <= 10000))
(13 rows)

这里有几点须要注意:blog

1.存在一个等价链接条件t1.a=t2.b,包括来自两个表的分区键。
2.在没有partition-wise join的状况下，链接将在“appending”来自任何分区表的每一个分区的全部行以后执行链接。对于partition-wise join，在匹配分区之间的链接后并附加结果。当链接结果的大小明显小于叉乘的结果时，这是有利的。更有利的是，若是分区自己是外部表，即分区中的数据驻留在外部服务器上。
3.在没有partition-wise join的状况下，它使用散列链接，可是在partition-wise join的状况下，它对分区之间的每一个链接使用不一样的策略，为每一个链接选择最佳策略。例如，prt1_p2和prt2_p2之间的链接使用带有prt2_p2_b索引扫描的嵌套循环链接做为参数化的内端，而另外一个链接使用散列链接。
4.条件t2.b between 0和10000之间消除了分区prt2_p3，所以在没有partition-wise join的状况下不会被计划扫描。可是它没有注意到prt1_p3中的任何一行都没有链接伙伴，而且仍然扫描该分区。使用partition-wise join，它意识到没有匹配的分区，消除了对prt1_p3的扫描。消除整个分区是一个重大的改进，由于顺序扫描很是昂贵。

排序

Partition-wise join优于未分区链接，由于它能够利用分区的属性，并使用更小的哈希表，这些哈希表可能彻底在内存中，更快的内存排序，在外部分区状况下的链接下推，等等。继承

基本的Partition-wise join以外索引

在提交的基本版本中，当链接表具备彻底相同的分区键数据类型并具备彻底匹配的分区边界时，将应用该技术。但有几个加强的可能性:内存

1.即便分区边界不彻底匹配，当一个分区表中的每一个分区最多有一个与另外一个分区表匹配的分区时，也可使用该技术。目前正在为此开发一个补丁。2.经过将未分区表与每一个分区分别链接并合并这些链接的结果，可使用此技术执行未分区表和已分区表之间的链接。当查询中的一些表是未分区的，而其余表是相似分区的，而且一个最佳计划将分区表和未分区表交错时，这可能会有所帮助。3.这种技术使用更多的内存和CPU，即便partition-wise join不是最佳策略。减小这种技术的内存和CPU占用。4.当链接两个不一样分区的表时，对其中一个表从新分区以匹配另外一个表的分区方案，而后使用partition-wise join进行链接;一种一般有助于经过从新分布数据来链接不一样的切分表的技术。