当链接到远程服务或资源到时候,处理那些须要一段时间才能修复的系统缺陷。这能优化应用对稳定性和可靠性。web
在分布式环境中,对远端服务或资源的请求可能会因为诸如如下临时性错误而失败:缓慢的网络请求,链接超时,资源被过分使用,或服务临时不可用。一般状况下,这些错误可以在短暂的中断后自我修复。一个健壮的云端应用应该可以经过重试模式等策略来处理这些问题。数据库
然而,有的时候这些错误缘于一些未知的事件,从而须要更长的时间修复。这些错误多是系统一部分没法链接,或是整个服务都响应失败。在这些状况下,盲目的去重试以前的操做可能并无意义,并且也不太可能会成功,取而代之系统应该快速识别出操做失败而后去处理这些失败。缓存
另外,若是一个服务很是繁忙,系统中的一部分出错将致使级连的错误。例如,一个调用其余服务的操做能够设定一个超时,而后在超时后返回错误。然而,这个策略可能致使不少访问这个服务的并发请求阻塞,直到超时。这些阻塞的请求可能占用了重要的系统资源,诸如内存,线程,数据库连接等。所以可能致使这些资源被耗尽,进而致使其余不相干的模块由于资源竞争而失败。在这些状况下,直接让这些操做失败,而后在合适的时候再去尝试调用这些服务,彷佛是更合理的选择。设定一个短一些的超时时长可能会有助于解决这个问题,可是又不能设定的过短而中断那些最终可能成功的请求。服务器
由 Michael Nygard 在其[书中](https://pragprog.com/book/mnee/release-it)普及的断路器模式,可以阻止应用重复的尝试执行可能失败的请求。这容许系统继续运行,而不用等待那些错误被修复,也不用浪费 CPU 循环,由于它已经识别到该错误是持续性的。断路器模式也使系统可以检测出错误是否已被修复。若是问题已经被修复,系统可以从新调用该操做。网络
断路器模式的目的和重试模式有所不一样。重试模式使应用可以重试指望成功的操做。断路器模式阻止应用去调用极可能失败的操做。应用能够联合使用两种模式。然而,重试逻辑应该可以处理断路器模式抛出的异常,并在断路器指示该错误非短时间可修复的错误时,中止重试。数据结构
断路器为可能会失败的操做充当代理的角色。这个代理监视最近发生的失败的数量,而后用这些信息判断是否继续执行该操做,仍是直接返回异常。并发
该代理能够经过一个状态机来实现,该状态机应模拟电子断路器来实现如下状态:app
关闭:来自应用的请求直接路由到对应的操做。代理维护一个计数器来记录最近失败的次数。若是一个操做失败,该计数器加一。若是最近失败的次数在指定时间段内超过一个阈值,代理被设定到 开启 状态。同时,代理启动一个计时器,当计时器超时后,代理被设定到 半开状态。负载均衡
设定计时器的目的是在应用重试该操做前,给系统留出时间修复致使该错误的问题。less
开启:从应用发送给该服务的请求直接失败,并返回异常。
半开:容许少许的请求经过代理调用该操做。若是请求成功,系统假定以前引发操做失败的错误已被修复,断路器设定到 关闭状态(且将失败计数器重置)。若是任何请求失败,断路器便假定以前的错误依旧存在,而后把状态从新置为打开,重启超时计时器,并为系统恢复该错误设定更长的恢复时间。
半开 状态有助于使恢复中的系统避免遭受突发的大量请求。在服务恢复过程当中,它可能只能支撑有限数量的请求,直至恢复彻底完成。在恢复过程当中接收大量请求,可能会使服务超时,甚至再次失败。
在上图中,关闭状态下使用的计数器是基于时间的,它会自动按期重置。这可以使断路器避免因偶发性失败而切换到失败状态。失败阈值设定使断路器只有在指定的时间内失败的次数达到了指定值后才切换到失败状态。半开状态下使用的计数器用来记录请求成功的次数。当连续成功的请求数量超过一个指定值后,断路器将切换到 关闭状态。若是任一调用失败,断路器将直接进入打开状态,下次进入半开状态的时候,成功计数器将被清零。
系统如何修复是属于本模式之外的内容,可能经过从新加载数据,重启失败的组件,或是修复网络问题。
断路器模式为在从错误中恢复的系统提供稳定性,同时下降对性能的影响。它经过快速驳回可能失败的请求来下降系统响应时间。若是每次断路器切换状态时都触发一个时间,则能够用来监视断路器保护部分的系统状态,或在断路器切换到 打开状态时为管理员提供报警。
这个模式是可定制的,并且可适配不一样类型的错误。例如,你能够将超时计数器的值调高,你能够将断路器处在开状态的初始值设为几秒,而后若是到时后失败未解决将超时器设为几分钟等。在一些状况下,除了让处在开状态的断路器返回失败和异常,也能够将其配置为返回一个对应用有意义的默认值。
当考虑如何实现该模式时,须要考虑以下问题:
异常处理。应用经过断路器调用服务需准备好如何处理因服务没法访问而产生的异常。处理异常的方式因应用不一样而不一样。例如,应用应能临时降级它对应的功能,调用候选的能得到一样数据的应用,或向用户报告错误,请其事后重试。
异常的类型。请求可能因为各类缘由而失败,其中一些致使的问题可能比其余更严重。例如,请求可能因为外部服务宕机而失败从而中断数分钟,或者因为服务过载而致使超时。断路器可能可以检测异常的类型从而使用不一样的策略。例如,若是要把断路器设定到 开状态,超时类型到错误次数的阈值要比系统彻底不可用的阈值要高很懂。
日志。断路器应该记录全部失败的请求(如何能够,也能够记录成功的)来容许管理员来监控操做的健康情况。
可恢复性。你应该为断路器配置其保护的操做可能的恢复模型。例如,若是断路器在打开状态维持了很长时间,可能致使即便错误已经修复,断路器仍抛出异常。相似的,若是断路器从开到半开的时间过短,可能致使它上下波动,减小应用的响应时间(??没懂)。
测试失败的操做。在开的状态下,除了用计数器来决定什么时候切换到半开状态,断路器还能够启用一个定时任务来周期性 ping 远端服务来判断该服务是否已能够访问。能够采用尝试调用以前失败的服务的形式,或调用远端服务提供的专门用来测试服务状态的操做,如健康情况健康模式 所描述的那样。
手动重载。对于系统恢复时间波动很是大的系统,提供一个手动重置选项来方便管理员关闭断路器(同时重置失败计数器)是颇有用的。相似的,若是断路器所保护的服务临时不可用,管理员能够强制打开断路器将其置为开状态(同时重置计时器)。
并发。断路器可能同时被大量客户端访问。其实现不用阻塞并发的请求,也不能给操做添加过多的额外负载。
资源区分。当咱们为一个由多个独立的提供者提供的同一个资源使用断路器时,咱们须要额外注意。例如,在一个由多个分片的数据存储资源中,即使其余分片遇到临时错误,单个分片也能够接受彻底的访问。若是在这种场景中,这些错误被合并成同一错误,应用可能会在某些分片错误时尝试去访问其余分片,但因为断路器的存在,对其余分片的访问也可能会被阻塞,即便它们可能成功。
加速熔断。有时候返回的错误信息包含足够信息令断路器断路。例如,一个共享资源过载,可直接另断路器断路而避免应用立刻重试。
[!注意事项]
一个服务可能在限流时返回 HTTP 429(太多的请求),或者在服务当前不可用时返回 HTTP 503(服务不可用)。HTTP 返回信息中可能包含了额外信息,好比下次重试的间隔时间等。
重放失败的请求。在打开的状态下,除了直接返回失败,断路器也能够将每一个请求的详细信息记录到日志中,而后而后在远程资源可访问后,重放该请求。
外部服务不适合的超时。断路器不适合用来保护那些设置了过长超时时长的外部服务。若是超时时间过长,断路器的线程可能阻塞,在这段时间内,其余应用可能耶尝试调用这个服务,从而致使断路器消耗大量的线程。
在如下场景可使用该模式:
如下场景不该该用该模式:
在 web 应用中,页面是根据外部服务得到的数据计算生成的。若是系统设定较少的缓存策略,大多数页面点击都会调用一次服务。从 web 应用到服务的请求能够设定超时时间(一般是60秒),若是服务在这段时间内未响应,页面的逻辑将认为服务不可用并抛出异常。
然而,若是服务失败而且系统很是繁忙,用户可能须要等60秒才会被提示异常。最终内存,连接,线程等资源可能会用尽,阻止其余用户链接系统,即便它们并非访问失败的那个服务。
经过添加更多的网络服务器和实现负载均衡来为系统扩容可以延缓资源耗尽的时间,但这并不会解决这个问题由于用户的请求仍会未响应而且最终因此网络服务器的资源终会耗尽。
为访问该服务查询数据的链接包裹一层断路器可以解决该问题,而且能更优雅地解决服务失败。用户的请求仍会失败,但失败将会更迅速而且不会阻塞资源。
The CircuitBreaker
class maintains state information about a circuit breaker in an object that implements the ICircuitBreakerStateStore
interface shown in the following code.
interface ICircuitBreakerStateStore { CircuitBreakerStateEnum State { get; } Exception LastException { get; } DateTime LastStateChangedDateUtc { get; } void Trip(Exception ex); void Reset(); void HalfOpen(); bool IsClosed { get; } }
The State
property indicates the current state of the circuit breaker, and will be either Open, HalfOpen, or Closed as defined by the CircuitBreakerStateEnum
enumeration. The IsClosed
property should be true if the circuit breaker is closed, but false if it's open or half open. The Trip
method switches the state of the circuit breaker to the open state and records the exception that caused the change in state, together with the date and time that the exception occurred. The LastException
and the LastStateChangedDateUtc
properties return this information. The Reset
method closes the circuit breaker, and the HalfOpen
method sets the circuit breaker to half open.
The InMemoryCircuitBreakerStateStore
class in the example contains an implementation of the ICircuitBreakerStateStore
interface. The CircuitBreaker
class creates an instance of this class to hold the state of the circuit breaker.
The ExecuteAction
method in the CircuitBreaker
class wraps an operation, specified as an Action
delegate. If the circuit breaker is closed, ExecuteAction
invokes the Action
delegate. If the operation fails, an exception handler calls TrackException
, which sets the circuit breaker state to open. The following code example highlights this flow.
public class CircuitBreaker { private readonly ICircuitBreakerStateStore stateStore = CircuitBreakerStateStoreFactory.GetCircuitBreakerStateStore(); private readonly object halfOpenSyncObject = new object (); ... public bool IsClosed { get { return stateStore.IsClosed; } } public bool IsOpen { get { return !IsClosed; } } public void ExecuteAction(Action action) { ... if (IsOpen) { // The circuit breaker is Open. ... (see code sample below for details) } // The circuit breaker is Closed, execute the action. try { action(); } catch (Exception ex) { // If an exception still occurs here, simply // retrip the breaker immediately. this.TrackException(ex); // Throw the exception so that the caller can tell // the type of exception that was thrown. throw; } } private void TrackException(Exception ex) { // For simplicity in this example, open the circuit breaker on the first exception. // In reality this would be more complex. A certain type of exception, such as one // that indicates a service is offline, might trip the circuit breaker immediately. // Alternatively it might count exceptions locally or across multiple instances and // use this value over time, or the exception/success ratio based on the exception // types, to open the circuit breaker. this.stateStore.Trip(ex); } }
The following example shows the code (omitted from the previous example) that is executed if the circuit breaker isn't closed. It first checks if the circuit breaker has been open for a period longer than the time specified by the local OpenToHalfOpenWaitTime
field in the CircuitBreaker
class. If this is the case, the ExecuteAction
method sets the circuit breaker to half open, then tries to perform the operation specified by the Action
delegate.
If the operation is successful, the circuit breaker is reset to the closed state. If the operation fails, it is tripped back to the open state and the time the exception occurred is updated so that the circuit breaker will wait for a further period before trying to perform the operation again.
If the circuit breaker has only been open for a short time, less than the OpenToHalfOpenWaitTime
value, the ExecuteAction
method simply throws a CircuitBreakerOpenException
exception and returns the error that caused the circuit breaker to transition to the open state.
Additionally, it uses a lock to prevent the circuit breaker from trying to perform concurrent calls to the operation while it's half open. A concurrent attempt to invoke the operation will be handled as if the circuit breaker was open, and it'll fail with an exception as described later.
... if (IsOpen) { // The circuit breaker is Open. Check if the Open timeout has expired. // If it has, set the state to HalfOpen. Another approach might be to // check for the HalfOpen state that had be set by some other operation. if (stateStore.LastStateChangedDateUtc + OpenToHalfOpenWaitTime < DateTime.UtcNow) { // The Open timeout has expired. Allow one operation to execute. Note that, in // this example, the circuit breaker is set to HalfOpen after being // in the Open state for some period of time. An alternative would be to set // this using some other approach such as a timer, test method, manually, and // so on, and check the state here to determine how to handle execution // of the action. // Limit the number of threads to be executed when the breaker is HalfOpen. // An alternative would be to use a more complex approach to determine which // threads or how many are allowed to execute, or to execute a simple test // method instead. bool lockTaken = false; try { Monitor.TryEnter(halfOpenSyncObject, ref lockTaken); if (lockTaken) { // Set the circuit breaker state to HalfOpen. stateStore.HalfOpen(); // Attempt the operation. action(); // If this action succeeds, reset the state and allow other operations. // In reality, instead of immediately returning to the Closed state, a counter // here would record the number of successful operations and return the // circuit breaker to the Closed state only after a specified number succeed. this.stateStore.Reset(); return; } catch (Exception ex) { // If there's still an exception, trip the breaker again immediately. this.stateStore.Trip(ex); // Throw the exception so that the caller knows which exception occurred. throw; } finally { if (lockTaken) { Monitor.Exit(halfOpenSyncObject); } } } } // The Open timeout hasn't yet expired. Throw a CircuitBreakerOpen exception to // inform the caller that the call was not actually attempted, // and return the most recent exception received. throw new CircuitBreakerOpenException(stateStore.LastException); } ...
To use a CircuitBreaker
object to protect an operation, an application creates an instance of the CircuitBreaker
class and invokes the ExecuteAction
method, specifying the operation to be performed as the parameter. The application should be prepared to catch the CircuitBreakerOpenException
exception if the operation fails because the circuit breaker is open. The following code shows an example:
var breaker = new CircuitBreaker(); try { breaker.ExecuteAction(() => { // Operation protected by the circuit breaker. ... }); } catch (CircuitBreakerOpenException ex) { // Perform some different action when the breaker is open. // Last exception details are in the inner exception. ... } catch (Exception ex) { ... }
在实现该模式时,如下模式也会有帮助: