OTP 平台的容错性高,是由于它提供了机制来监控全部 processes 的状态,若是有进程出现异常, 不只能够及时检测到错误,还能够对 processes 进行重启等操做。html
有了 supervisor,能够有效的提升系统的可用性,一个 supervior 监督一个或多个应用, 同时, supervior 也能够监督 supervior,从而造成一个监督树,提升整个系统的可用性。测试
注意 ,supervior 最好只用于监督,不要有其余的业务逻辑处理,越是接近监督树根部的 supervior 就要越简单, 由于 supervior 简单就不容易出错,它是保证系统高可用的关键。rest
下面,使用 elixir 中提供的 Supervisor 模块,构造简单的监督示例来演示如何提升系统的可用性。code
监督策略有4种:server
监督策略的转换很是简单,下面演示2种监督策略的示例:htm
defmodule PseudoServerA do use GenServer def start_link(state, opts \\ []) do GenServer.start_link(__MODULE__, state, opts) end def handle_call(:display, _from, []) do {:reply, 'ServerA PID: ' ++ :erlang.pid_to_list(self()), []} end def handle_cast(:err, []) do {:stop, "stop ServerA", []} end end defmodule PseudoServerB do use GenServer def start_link(state, opts \\ []) do GenServer.start_link(__MODULE__, state, opts) end def handle_call(:display, _from, []) do {:reply, 'ServerB PID: ' ++ :erlang.pid_to_list(self()), []} end def handle_cast(:err, []) do {:stop, "stop ServerB", []} end end defmodule PseudoServerC do use GenServer def start_link(state, opts \\ []) do GenServer.start_link(__MODULE__, state, opts) end def handle_call(:display, _from, []) do {:reply, 'ServerC PID: ' ++ :erlang.pid_to_list(self()), []} end def handle_cast(:err, []) do {:stop, "stop ServerC", []} end end defmodule SupervisorTest do import Supervisor.Spec def init() do children = [ worker(PseudoServerA, [[], [name: :server_a]]), worker(PseudoServerB, [[], [name: :server_b]]), worker(PseudoServerC, [[], [name: :server_c]]) ] # Start the supervisor with children Supervisor.start_link(children, strategy: :one_for_one) end end
测试方式:blog
$ iex -S mix # 启动 supervisor 及其监督的3个 process iex(1)> SupervisorTest.init {:ok, #PID<0.145.0>} # 启动后, 3个 process 的 PID 以下 iex(2)> GenServer.call(:server_a, :display) 'ServerA PID: <0.146.0>' iex(3)> GenServer.call(:server_b, :display) 'ServerB PID: <0.147.0>' iex(4)> GenServer.call(:server_c, :display) 'ServerC PID: <0.148.0>' # 经过消息 :err 让 serverA 出错 iex(5)> GenServer.cast(:server_a, :err) :ok iex(6)> 14:47:53.119 [error] GenServer :server_a terminating ** (stop) "stop ServerA" Last message: {:"$gen_cast", :err} State: [] nil # serverA 出错后,再次查看3个process的PID,发现 supervisor 只重启了 serverA,符合策略 :one_for_one iex(7)> GenServer.call(:server_a, :display) 'ServerA PID: <0.155.0>' iex(8)> GenServer.call(:server_b, :display) 'ServerB PID: <0.147.0>' iex(9)> GenServer.call(:server_c, :display) 'ServerC PID: <0.148.0>'
咱们换一种监督策略试试看,只须要将上面的代码进程
# Start the supervisor with children Supervisor.start_link(children, strategy: :one_for_one)
改为get
# Start the supervisor with children Supervisor.start_link(children, strategy: :one_for_all)
测试步骤 和 one_for_one 同样:it
$ iex -S mix # 启动 supervisor 及其监督的3个 process iex(1)> SupervisorTest.init {:ok, #PID<0.145.0>} # 启动后, 3个 process 的 PID 以下 iex(2)> GenServer.call(:server_a, :display) 'ServerA PID: <0.146.0>' iex(3)> GenServer.call(:server_b, :display) 'ServerB PID: <0.147.0>' iex(4)> GenServer.call(:server_c, :display) 'ServerC PID: <0.148.0>' # 经过消息 :err 让 serverA 出错 iex(5)> GenServer.cast(:server_a, :err) :ok iex(6)> 14:55:16.183 [error] GenServer :server_a terminating ** (stop) "stop ServerA" Last message: {:"$gen_cast", :err} State: [] nil # serverA 出错后,再次查看3个process的PID,发现 supervisor 重启了全部 process,符合策略 :one_for_all iex(7)> GenServer.call(:server_a, :display) 'ServerA PID: <0.153.0>' iex(8)> GenServer.call(:server_b, :display) 'ServerB PID: <0.154.0>' iex(9)> GenServer.call(:server_c, :display) 'ServerC PID: <0.156.0>'
监督者并非一维的,监督者也能够监督其它监督者,从而造成树状的监督关系。
修改上面的测试代码以下:(只修改了 Supervisor 的部分)
defmodule PseudoServerA do use GenServer def start_link(state, opts \\ []) do GenServer.start_link(__MODULE__, state, opts) end def handle_call(:display, _from, []) do {:reply, 'ServerA PID: ' ++ :erlang.pid_to_list(self()), []} end def handle_cast(:err, []) do {:stop, "stop ServerA", []} end end defmodule PseudoServerB do use GenServer def start_link(state, opts \\ []) do GenServer.start_link(__MODULE__, state, opts) end def handle_call(:display, _from, []) do {:reply, 'ServerB PID: ' ++ :erlang.pid_to_list(self()), []} end def handle_cast(:err, []) do {:stop, "stop ServerB", []} end end defmodule PseudoServerC do use GenServer def start_link(state, opts \\ []) do GenServer.start_link(__MODULE__, state, opts) end def handle_call(:display, _from, []) do {:reply, 'ServerC PID: ' ++ :erlang.pid_to_list(self()), []} end def handle_cast(:err, []) do {:stop, "stop ServerC", []} end end defmodule SupervisorBranch do import Supervisor.Spec def start_link(state) do children = [ worker(PseudoServerA, [[], [name: :server_a]]), worker(PseudoServerB, [[], [name: :server_b]]), ] Supervisor.start_link(children, strategy: :one_for_one) end end defmodule SupervisorRoot do import Supervisor.Spec def init() do children = [ supervisor(SupervisorBranch, [[name: :supervisor_branch]]), worker(PseudoServerC, [[], [name: :server_c]]) ] # Start the supervisor with children Supervisor.start_link(children, strategy: :one_for_all) end end
测试流程以下:
# 启动 根 监督者 iex(1)> SupervisorRoot.init {:ok, #PID<0.149.0>} # 启动后,查看 3 个process 的PID iex(2)> GenServer.call(:server_a, :display) 'ServerA PID: <0.151.0>' iex(3)> GenServer.call(:server_b, :display) 'ServerB PID: <0.152.0>' iex(4)> GenServer.call(:server_c, :display) 'ServerC PID: <0.153.0>' # 经过消息 :err 让 serverA 出错 iex(5)> GenServer.cast(:server_a, :err) :ok iex(6)> 15:31:15.846 [error] GenServer :server_a terminating ** (stop) "stop ServerA" Last message: {:"$gen_cast", :err} State: [] nil # serverA 出错后,由于它的监督者 SupervisorBranch 的策略是 :one_for_one,因此只重启了 serverA iex(7)> GenServer.call(:server_a, :display) 'ServerA PID: <0.158.0>' iex(8)> GenServer.call(:server_b, :display) 'ServerB PID: <0.152.0>' iex(9)> GenServer.call(:server_c, :display) 'ServerC PID: <0.153.0>' # 经过消息 :err 让 serverC 出错 iex(10)> GenServer.cast(:server_c, :err) :ok 15:31:35.264 [error] GenServer :server_c terminating ** (stop) "stop ServerC" Last message: {:"$gen_cast", :err} State: [] # serverC 出错后,由于它的监督者 SupervisorRoot 的策略是 :one_for_all,因此全部的 proocess 都重启了 iex(11)> GenServer.call(:server_a, :display) 'ServerA PID: <0.166.0>' iex(12)> GenServer.call(:server_c, :display) 'ServerC PID: <0.168.0>' iex(13)> GenServer.call(:server_b, :display) 'ServerB PID: <0.167.0>'
经过监督树,咱们能够给不一样的 process 分组,而后让每一个组有不一样的监督策略。
有了监督机制,能够及时的把握全部 process 的状态,经过监督树,还能够加入不一样恢复机制。 所以,用好 Supervisor 模块,能够极大提升系统的可用性。
Supervisor 模块详细内容能够参见:http://elixir-lang.org/docs/stable/elixir/Supervisor.html