Nested Loop Join - JavaShuo

咱们都知道SQL的join关联表的使用方式，可是此次聊的是实现join的算法，join有三种算法，分别是Nested Loop Join，Hash join，Sort Merge Join。html

MySQL官方文档中提到，MySQL只支持Nested Loop Join这一种join algorithmmysql

MySQL resolves all joins using a nested-loop join method. This means that MySQL reads a row from the first table, and then finds a matching row in the second table, the third table, and so on. explain-output算法

因此本篇只聊Nested Loop Join。sql

NLJ是经过两层循环，用第一张表作Outter Loop，第二张表作Inner Loop，Outter Loop的每一条记录跟Inner Loop的记录做比较，符合条件的就输出。而NLJ又有3种细分的算法：缓存

一、Simple Nested Loop Join（SNLJ）

// 伪代码
    for (r in R) {
        for (s in S) {
            if (r satisfy condition s) {
                output <r, s>;
            }
        }
    }

SNLJ就是两层循环全量扫描链接的两张表，获得符合条件的两条记录则输出，这也就是让两张表作笛卡尔积，比较次数是R * S，是比较暴力的算法，会比较耗时。oop

二、Index Nested Loop Join（INLJ）

// 伪代码
    for (r in R) {
        for (si in SIndex) {
            if (r satisfy condition si) {
                output <r, s>;
            }
        }
    }

INLJ是在SNLJ的基础上作了优化，经过链接条件肯定可用的索引，在Inner Loop中扫描索引而不去扫描数据自己，从而提升Inner Loop的效率。而INLJ也有缺点，就是若是扫描的索引是非聚簇索引，而且须要访问非索引的数据，会产生一个回表读取数据的操做，这就多了一次随机的I/O操做。优化

三、Block Nested Loop Join（BNLJ）

通常状况下，MySQL优化器在索引可用的状况下，会优先选择使用INLJ算法，可是在无索引可用，或者判断full scan可能比使用索引更快的状况下，仍是不会选择使用过于粗暴的SNLJ算法。这里就出现了BNLJ算法了，BNLJ在SNLJ的基础上使用了join buffer，会提早读取Inner Loop所须要的记录到buffer中，以提升Inner Loop的效率。spa

// 伪代码
    for (r in R) {
        for (sbu in SBuffer) {
            if (r satisfy condition sbu) {
                output <r, s>;
            }
        }
    }

MySQL中控制join buffer大小的参数名是join_buffer_size。翻译

We only store the used columns in the join buffer, not the whole rows.<br/>join-buffer-sizecode

根据MySQL手册中的说法，join_buffer_size缓冲的是被使用到的列。

算法比较(外表大小R，内表大小S)：

\algorithm comparison\	Simple Nested Loop Join	Index Nested Loop Join	Block Nested Loop Join
外表扫描次数	1	1	1
内表扫描次数	R	0
读取记录次数	R + R * S	R + RS_Matches
比较次数	R * S	R * IndexHeight	R * S
回表次数	0	RS_Matches	0

在MySQL5.6中，对INLJ的回表操做进行了优化，增长了Batched Key Access Join（批量索引访问的表关联方式，这样翻译能够不。。。）和Multi Range Read（mrr，多范围读取）特性，在join操做中缓存所须要的数据的rowid，再批量去获取其数据，把I/O从屡次零散的操做优化为更少次数批量的操做，提升效率。