让咱们聊聊Mnesia（一）

时间 2019-11-12

标签咱们聊聊 mnesia 繁體版

原文原文链接

Mnesia是什么

Mensia是Erlang的OTP库中一个带有强事务的分布式KV存储引擎。能够很是方便且高效的存储Erlang的任何数据类型。而且该系统支持持久化和内存表混合使用，不强制要求全部节点的性质相同。
node

Mnesia的分布式事务是怎么实现的

Mnesia的事务模型是2PC的模型，基本上能够分为如下几个步骤git

让全部参与的Node准备进行提交github
若是有一个Node回答No，让全部回答yes的Node回滚缓存
若是全部Node都回答yes，让全部Node提交服务器

mnesia_tm.erl中的t_commit完成了进行事务提交的准备工做，在这些准备工做中，arrange函数将缓存在ets中的数据操做转化成prepare记录和commit记录。网络

t_commit(Type) ->
    {_Mod, Tid, Ts} = get(mnesia_activity_state),
    %先把ETS表拿出来
    Store = Ts#tidstore.store,
    if
    	%单层事务
	Ts#tidstore.level == 1 ->
	    intercept_friends(Tid, Ts),
	    %% N is number of updates
	    case arrange(Tid, Store, Type) of
		{N, Prep} when N > 0 ->
		    multi_commit(Prep#prep.protocol,
				 majority_attr(Prep),
				 Tid, Prep#prep.records, Store);
		{0, Prep} ->
		    multi_commit(read_only,
				 majority_attr(Prep),
				 Tid, Prep#prep.records, Store)
	    end;
	true ->
	    %% nested commit
	    Level = Ts#tidstore.level,
	    [{OldMod,Obsolete} | Tail] = Ts#tidstore.up_stores,
	    req({del_store, Tid, Store, Obsolete, false}),
	    NewTs = Ts#tidstore{store = Store,
				up_stores = Tail,
				level = Level - 1},
	    NewTidTs = {OldMod, Tid, NewTs},
	    put(mnesia_activity_state, NewTidTs),
	    do_commit_nested
    end.

而multi_commit函数完成事务的2PC部分,该函数分支是默认的Erlang事务使用的提交方式运维

%使用简单的2PC进行，
%1. 让全部参与的Node准备进行提交
%2a.若是有一个Node回答No，让全部回答yes的Node回滚
%2b.若是全部Node都回答yes，让全部Node提交
multi_commit(sym_trans, _Maj = [], Tid, CR, Store) ->
    %% This lightweight commit protocol is used when all
    %% the involved tables are replicated symetrically.
    %% Their storage types must match on each node.
    %%
    %% 1  Ask the other involved nodes if they want to commit
    %%    All involved nodes votes yes if they are up
    %% 2a Somebody has voted no
    %%    Tell all yes voters to do_abort
    %% 2b Everybody has voted yes
    %%    Tell everybody to do_commit. I.e. that they should
    %%    prepare the commit, log the commit record and
    %%    perform the updates.
    %%
    %%    The outcome is kept 3 minutes in the transient decision table.
    %%
    %% Recovery:
    %%    If somebody dies before the coordinator has
    %%    broadcasted do_commit, the transaction is aborted.
    %%
    %%    If a participant dies, the table load algorithm
    %%    ensures that the contents of the involved tables
    %%    are picked from another node.
    %%
    %%    If the coordinator dies, each participants checks
    %%    the outcome with all the others. If all are uncertain
    %%    about the outcome, the transaction is aborted. If
    %%    somebody knows the outcome the others will follow.
    %划分全部的提交节点为内存或磁盘
    {DiscNs, RamNs} = commit_nodes(CR, [], []),
    %进入事务提交的准备状态，这时候事务尚未真正的提交完成
    Pending = mnesia_checkpoint:tm_enter_pending(Tid, DiscNs, RamNs),
    ?ets_insert(Store, Pending),
    %循环的发出提交申请到各参与的节点上
    {WaitFor, Local} = ask_commit(sym_trans, Tid, CR, DiscNs, RamNs),
    %此处是死等，可是实际上也不是会完全死等
    %什么状况会发生死等呢
    %在ask_commit以后，对端节点死掉了，可是在下一次Erts心跳以前
    %对端节点又启动起来了,OK这就是个有意思的状况

    %全部节点都返回了赞成Outcome为do_commit
    {Outcome, []} = rec_all(WaitFor, Tid, do_commit, []),
    ?eval_debug_fun({?MODULE, multi_commit_sym},
		    [{tid, Tid}, {outcome, Outcome}]),
    %向全部磁盘节点广播提交
    rpc:abcast(DiscNs -- [node()], ?MODULE, {Tid, Outcome}),
    %向全部内存节点广播提交
    rpc:abcast(RamNs -- [node()], ?MODULE, {Tid, Outcome}),
    case Outcome of
	do_commit ->
	    mnesia_recover:note_decision(Tid, committed),
	    do_dirty(Tid, Local),
	    mnesia_locker:release_tid(Tid),
	    ?MODULE ! {delete_transaction, Tid};
	{do_abort, _Reason} ->
	    mnesia_recover:note_decision(Tid, aborted)
    end,
    ?eval_debug_fun({?MODULE, multi_commit_sym, post},
		    [{tid, Tid}, {outcome, Outcome}]),
    Outcome;

Mnesia中常见问题和解决方法

常见问题

脑裂
分布式
传说中的事务无限等待函数

问题成因

脑裂的成因，主要是网络不稳定，致使两个节点长时间的失去联系，让彼此都认为对方掉线了。而这个时候，两个节点都接收了大量的数据写入。当两个节点自动恢复集群通讯的时候，没法经过事务决议合并数据的时候才会出现。
post
在ask_commit以后，对端节点死掉了，可是在下一次Erts心跳以前,对端节点又启动起来了。OK，这就是这种有意思的状况。基本上来说，这种事情发生的几率很是小，除非是设计失误和对Erlang系统不熟悉滥用heart这东西产生的。

解决方法

对于脑裂问题，没有什么特别想说的。首先让运维作好内网通讯的管理，Mnesia集群使用专用的内部交换机和交换机热冗余，一点都不过度。其次，作好脑裂发生的准备，在应用层面进行处理，能够参考大神的https://github.com/uwiger/unsplit项目。
对于这个传说中的问题，我本身在使用Mnesia的集群中并无遇到过。解决这问题，首先，要搞清楚Erts集群是怎么互相探测是活的，能够看到我前面的博文http://my.oschina.net/u/236698/blog/389737。其次，在整个集群内部创建NTP服务器，保证整个集群的对时稳定性。再次，使用heart时，不要严格按照那个心跳时间设置，至少要设置2.5倍节点之间心跳探测时间为保活时间。