本 PR 中, Jose Valim 优化了 v3_kernel
中对于 match clauses 的处理函数 match_con/4
, 在有上千个clauses 的状况下, 编译速度有 10% 的提高.性能优化
如下这一段逻辑被删除了, 这里对 Cs 有一次遍历:函数
%% old
match_con(Us, Cs0, Def, St) ->
%% Expand literals at the top level.
Cs = [expand_pat_lit_clause(C) || C <- Cs0],
match_con_1(Us, Cs, Def, St).
复制代码
它首先对 Cs 中全部的 clause 作了 expand_pat_lit_clause/1
操做. 以后 match_con_1/4
函数体中的逻辑, 与新代码中有些许不一样:性能
%% old
match_con_1([U|_Us] = L, Cs, Def, St0) ->
%% Extract clauses for different constructors (types).
%%ok = io:format("match_con ~p~n", [Cs]),
Ttcs0 = select_types([k_binary], Cs) ++ select_bin_con(Cs) ++
select_types([k_cons,k_tuple,k_map,k_atom,k_float,
k_int,k_nil], Cs),
Ttcs = opt_single_valued(Ttcs0),
%% new
match_con([U|_Us] = L, Cs, Def, St0) ->
Ttcs0 = select_types(Cs, [], [], [], [], [], [], [], [], []),
Ttcs1 = [{T, Types} || {T, [_ | _] = Types} <- Ttcs0],
Ttcs = opt_single_valued(Ttcs1),
复制代码
注意到, 在执行最后一行以前, 都经过 select_types
函数对 Cs 作了处理. 在old 代码中, 对于k_binary
type, 要遍历一次Cs; 对于select_bin_con
, 又要遍历一次 Cs; 对于其它 types, 还要遍历一次 Cs. 而在new 代码中, 只遍历了一次 Cs. select_types
函数是这个 PR 里改动最大的地方, 让咱们来看一下:优化
%% old
select_types(Types, Cs) ->
[{T,Tcs} || T <- Types, begin Tcs = select(T, Cs), Tcs =/= [] end].
%% select(Con, [Clause]) -> [Clause].
select(T, Cs) -> [ C || C <- Cs, clause_con(C) =:= T ].
%% new
select_types([NoExpC | Cs], Bin, BinCon, Cons, Tuple, Map, Atom, Float, Int, Nil) ->
C = expand_pat_lit_clause(NoExpC),
case clause_con(C) of
k_binary ->
select_types(Cs, [C |Bin], BinCon, Cons, Tuple, Map, Atom, Float, Int, Nil);
k_bin_seg ->
select_types(Cs, Bin, [C | BinCon], Cons, Tuple, Map, Atom, Float, Int, Nil);
k_bin_end ->
select_types(Cs, Bin, [C | BinCon], Cons, Tuple, Map, Atom, Float, Int, Nil);
k_cons ->
select_types(Cs, Bin, BinCon, [C | Cons], Tuple, Map, Atom, Float, Int, Nil);
k_tuple ->
select_types(Cs, Bin, BinCon, Cons, [C | Tuple], Map, Atom, Float, Int, Nil);
k_map ->
select_types(Cs, Bin, BinCon, Cons, Tuple, [C | Map], Atom, Float, Int, Nil);
k_atom ->
select_types(Cs, Bin, BinCon, Cons, Tuple, Map, [C | Atom], Float, Int, Nil);
k_float ->
select_types(Cs, Bin, BinCon, Cons, Tuple, Map, Atom, [C | Float], Int, Nil);
k_int ->
select_types(Cs, Bin, BinCon, Cons, Tuple, Map, Atom, Float, [C | Int], Nil);
k_nil ->
select_types(Cs, Bin, BinCon, Cons, Tuple, Map, Atom, Float, Int, [C | Nil])
end;
select_types([], Bin, BinCon, Cons, Tuple, Map, Atom, Float, Int, Nil) ->
[{k_binary, reverse(Bin)}] ++ handle_bin_con(reverse(BinCon)) ++
[
{k_cons, reverse(Cons)},
{k_tuple, reverse(Tuple)},
{k_map, reverse(Map)},
{k_atom, reverse(Atom)},
{k_float, reverse(Float)},
{k_int, reverse(Int)},
{k_nil, reverse(Nil)}
].
复制代码
注意到尽管新代码里只须要遍历一次 Cs, 但最后的结果仍是要每一个小 list 都作一次反转的. 因此, 此 PR 的性能优化点是在把对于一个 list 的四次遍历变为了一次遍历, 在 list 很长的状况下优化会更明显.atom