semi-join子查询优化 -- semi-join Materialization策略

semi-join Materialization 是用于semi-join的一种特殊的子查询物化技术。一般包含两种策略:
1.Materialization/lookup
2.Materialization/scansql

考虑一个查询欧洲有大城市的国家:函数

select * from Country 
where Country.code IN (select City.Country 
                       from City 
                       where City.Population > 7*1000*1000)
      and Country.continent='Europe'

子查询是非相关子查询。也便是咱们能够独立运行内查询。semi-materialization的思想是使用city.country中可能的值填充一个临时表,而后和欧洲的国家进行关联。优化

这个join能够从两个方向进行:
1.从物化表到国家表
2.从国家表到物化表code

第一个方向涉及一个全表扫描(在物化表上的全表扫描),所以被称为"Materialization-scan"
若是从第二个方向进行,最廉价的方式是使用主键从物化表中lookup出匹配的记录。这种方式被称为"Materialization-lookup"。blog

 

Materialization-scan
若是咱们寻找人口超过700万的城市,优化器将使用materialize-scan,EXPLAIN输出结果也会显示这一点:索引

MariaDB [world]> explain select * from Country where Country.code IN (select City.Country from City where  City.Population > 7*1000*1000);
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
| id | select_type  | table       | type   | possible_keys      | key        | key_len | ref                | rows | Extra                 |
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
|  1 | PRIMARY      | <subquery2> | ALL    | distinct_key       | NULL       | NULL    | NULL               |   15 |                       |
|  1 | PRIMARY      | Country     | eq_ref | PRIMARY            | PRIMARY    | 3       | world.City.Country |    1 |                       |
|  2 | MATERIALIZED | City        | range  | Population,Country | Population | 4       | NULL               |   15 | Using index condition |
+----+--------------+-------------+--------+--------------------+------------+---------+--------------------+------+-----------------------+
3 rows in set (0.01 sec)

从上能够看到:ci

1.仍然有两个select(id=1和id=2)get

2.第二个select(id=2)的select_type是MATERIALIZED。这表示会执行并将结果存储在一个在全部列上带有一个惟一性索引的临时表。这个惟一性索引能够避免有重复的记录it

3.第一个select中接收到一个名为subquery2的表,这是从第二个select(id=2)获取的物化的表io

优化器选择在物化的表上执行全表扫描。这就是Materialization-Scan策略的示例。

 

至于执行成本,咱们将从表City读取15行,将15行写入物化表,而后读取它们(优化器假设不会有任何重复),而后对表Country执行15次eq_ref访问。总共,咱们将进行45次读取和15次写入。

相比之下,若是你在MySQL中运行EXPLAIN,你会获得以下结果:

MySQL [world]> explain select * from Country where Country.code IN (select City.Country from City where  City.Population > 7*1000*1000);
+----+--------------------+---------+-------+--------------------+------------+---------+------+------+------------------------------------+
| id | select_type        | table   | type  | possible_keys      | key        | key_len | ref  | rows | Extra                              |
+----+--------------------+---------+-------+--------------------+------------+---------+------+------+------------------------------------+
|  1 | PRIMARY            | Country | ALL   | NULL               | NULL       | NULL    | NULL |  239 | Using where                        |
|  2 | DEPENDENT SUBQUERY | City    | range | Population,Country | Population | 4       | NULL |   15 | Using index condition; Using where |
+----+--------------------+---------+-------+--------------------+------------+---------+------+------+------------------------------------+

读的记录是(239 + 239*15) = 3824。

 

Materialization-Lookup

让咱们稍微修改一下查询,看看哪些国家的城市人口超过1百万(而不是7百万):

MariaDB [world]> explain select * from Country where Country.code IN (select City.Country from City where  City.Population > 1*1000*1000) ;
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
| id | select_type  | table       | type   | possible_keys      | key          | key_len | ref  | rows | Extra                 |
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
|  1 | PRIMARY      | Country     | ALL    | PRIMARY            | NULL         | NULL    | NULL |  239 |                       |
|  1 | PRIMARY      | <subquery2> | eq_ref | distinct_key       | distinct_key | 3       | func |    1 |                       |
|  2 | MATERIALIZED | City        | range  | Population,Country | Population   | 4       | NULL |  238 | Using index condition |
+----+--------------+-------------+--------+--------------------+--------------+---------+------+------+-----------------------+
3 rows in set (0.00 sec)

explain的输出结果和Materialization-scan相似,除了:
1.subquery2表是经过eq_ref访问的
2.access使用了索引distinct_key
这意味着优化器计划对物化表执行索引查找。换句话说,咱们将使用Materialization-lookup策略。

 

在MySQL中(或者使用optimizer_switch='semi-join=off,materialization=off'),会获得这样的执行计划:

MySQL [world]> explain select * from Country where Country.code IN (select City.Country from City where  City.Population > 1*1000*1000) ;
+----+--------------------+---------+----------------+--------------------+---------+---------+------+------+-------------+
| id | select_type        | table   | type           | possible_keys      | key     | key_len | ref  | rows | Extra       |
+----+--------------------+---------+----------------+--------------------+---------+---------+------+------+-------------+
|  1 | PRIMARY            | Country | ALL            | NULL               | NULL    | NULL    | NULL |  239 | Using where |
|  2 | DEPENDENT SUBQUERY | City    | index_subquery | Population,Country | Country | 3       | func |   18 | Using where |
+----+--------------------+---------+----------------+--------------------+---------+---------+------+------+-------------+

能够看出,这两个执行计划都将对国家表进行全面扫描。对于第二步,MariaDB将填充物化表(238行从表City读取并写入临时表),而后对表Country中的每一个记录执行唯一的键查找,结果是238个唯一的键查找。总的来讲,第二步将花费(239+238)= 477读取和238 temp.table的写入。

MySQL的第二步计划是使用City上的索引读取18行。它为表国家接收的每一个记录的国家。计算出来的成本为(18*239)= 4302读取。若是有更少的子查询调用,这个计划将比物化的计划更好。顺便说一下,MariaDB也能够选择使用这样的查询计划(请参阅FirstMatch策略),可是它没有选择。

 

带有group by的子查询

当子查询带有分组的时候,MariaDB可使用semi-join物化策略(这种场景下,其余semi-join策略不适用)

这容许高效地执行搜索某个组中最佳/最后一个元素的查询。

举个例子,咱们来看看每一个大陆上人口最多的城市:

explain 
select * from City 
where City.Population in (select max(City.Population) from City, Country 
                          where City.Country=Country.Code 
                          group by Continent)
+------+--------------+-------------+------+---------------+------------+---------+----------------------------------+------+-----------------+
| id   | select_type  | table       | type | possible_keys | key        | key_len | ref                              | rows | Extra           |
+------+--------------+-------------+------+---------------+------------+---------+----------------------------------+------+-----------------+
|    1 | PRIMARY      | <subquery2> | ALL  | distinct_key  | NULL       | NULL    | NULL                             |  239 |                 |
|    1 | PRIMARY      | City        | ref  | Population    | Population | 4       | <subquery2>.max(City.Population) |    1 |                 |
|    2 | MATERIALIZED | Country     | ALL  | PRIMARY       | NULL       | NULL    | NULL                             |  239 | Using temporary |
|    2 | MATERIALIZED | City        | ref  | Country       | Country    | 3       | world.Country.Code               |   18 |                 |
+------+--------------+-------------+------+---------------+------------+---------+----------------------------------+------+-----------------+
4 rows in set (0.00 sec)

 城市是:

+------+-------------------+---------+------------+
| ID   | Name              | Country | Population |
+------+-------------------+---------+------------+
| 1024 | Mumbai (Bombay)   | IND     |   10500000 |
| 3580 | Moscow            | RUS     |    8389200 |
| 2454 | Macao             | MAC     |     437500 |
|  608 | Cairo             | EGY     |    6789479 |
| 2515 | Ciudad de México | MEX     |    8591309 |
|  206 | São Paulo        | BRA     |    9968485 |
|  130 | Sydney            | AUS     |    3276207 |
+------+-------------------+---------+------------+

  

Semi-join materialization
1.能够用于非相关的in子查询。子查询能够含有分组、和/或聚合函数
2.在explain输出中,子查询会有type=Materialized;父表子查询中有table=<subqueryN>
3.开启须要将变量optimizer_switch中的materialization=on、semijoin=on
4.Non-semijoin materialization与materialization=on|off标记共享

 

https://mariadb.com/kb/en/library/semi-join-materialization-strategy/

相关文章
相关标签/搜索