dubbo源码解析五 --- 集群容错架构设计与原理分析

这里的 Invoker 是 Provider 的一个可调用 Service 的抽象，Invoker 封装了 Provider 地址及 Service 接口信息
Directory 表明多个 Invoker，能够把它当作 List<Invoker> ，但与 List 不一样的是，它的值多是动态变化的，好比注册中心推送变动
Cluster 将 Directory 中的多个 Invoker 假装成一个 Invoker，对上层透明，假装过程包含了容错逻辑，调用失败后，重试另外一个
Router 负责从多个 Invoker 中按路由规则选出子集，好比读写分离，应用隔离等
LoadBalance 负责从多个 Invoker 中选出具体的一个用于本次调用，选的过程包含了负载均衡算法，调用失败后，须要重选

集群容错模式

能够自行扩展集群容错策略，参见：集群扩展api

Failover Cluster

失败自动切换，当出现失败，重试其它服务器 [1]。一般用于读操做，但重试会带来更长延迟。可经过 retries="2" 来设置重试次数(不含第一次)。缓存

重试次数配置以下：

<dubbo:service retries="2" />

或

<dubbo:reference retries="2" />

或

<dubbo:reference>
    <dubbo:method name="findFoo" retries="2" />
</dubbo:reference>

Failfast Cluster

快速失败，只发起一次调用，失败当即报错。一般用于非幂等性的写操做，好比新增记录。

Failsafe Cluster

失败安全，出现异常时，直接忽略。一般用于写入审计日志等操做。

Failback Cluster

失败自动恢复，后台记录失败请求，定时重发。一般用于消息通知操做。

Forking Cluster

并行调用多个服务器，只要一个成功即返回。一般用于实时性要求较高的读操做，但须要浪费更多服务资源。可经过 forks="2" 来设置最大并行数。

Broadcast Cluster

广播调用全部提供者，逐个调用，任意一台报错则报错 [2]。一般用于通知全部提供者更新缓存或日志等本地资源信息。

集群模式配置

按照如下示例在服务提供方和消费方配置集群模式

<dubbo:service cluster="failsafe" />

或

<dubbo:reference cluster="failsafe" />

Dubbo集群容错的架构分析

经过官网上这张图咱们能大体的了解到一个请求过来，在集群中的调用过程。那么咱们就根据这个调用过程来进行分析吧。

整个在调用的过程当中这三个关键词接下来会贯穿全文,他们就是Directory,Router,LoadBalance

咱们只要紧紧的抓住这几个关键字就能贯穿整个调用链

先看下时序图，来看下调用的过程

最初咱们一个方法调用

咱们使用的是官方的dubbo-demo的dubbo-demo-consumer

public static void main(String[] args) {
    DemoService demoService = (DemoService) context.getBean("demoService"); // get remote service proxy
    String hello = demoService.sayHello("world"); // call remote method
    System.out.println(hello); // get result
   
}

调用 InvokerInvocationHandler#invoker方法代理类的调用

public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
    String methodName = method.getName();
    Class<?>[] parameterTypes = method.getParameterTypes();
    if (method.getDeclaringClass() == Object.class) {
        return method.invoke(invoker, args);
    }
    if ("toString".equals(methodName) && parameterTypes.length == 0) {
        return invoker.toString();
    }
    if ("hashCode".equals(methodName) && parameterTypes.length == 0) {
        return invoker.hashCode();
    }
    if ("equals".equals(methodName) && parameterTypes.length == 1) {
        return invoker.equals(args[0]);
    }

    RpcInvocation invocation;
    if (RpcUtils.hasGeneratedFuture(method)) {
        Class<?> clazz = method.getDeclaringClass();
        String syncMethodName = methodName.substring(0, methodName.length() - Constants.ASYNC_SUFFIX.length());
        Method syncMethod = clazz.getMethod(syncMethodName, method.getParameterTypes());
        invocation = new RpcInvocation(syncMethod, args);
        invocation.setAttachment(Constants.FUTURE_GENERATED_KEY, "true");
        invocation.setAttachment(Constants.ASYNC_KEY, "true");
    } else {
        invocation = new RpcInvocation(method, args);
        if (RpcUtils.hasFutureReturnType(method)) {
            invocation.setAttachment(Constants.FUTURE_RETURNTYPE_KEY, "true");
            invocation.setAttachment(Constants.ASYNC_KEY, "true");
        }
    }
    //这里使用的是动态代理的方式获取到指定的代理类
   // <1>
    return invoker.invoke(invocation).recreate();
}

1 执行invoke就要开始进入MockClusterInvoker#invoker

public Result invoke(Invocation invocation) throws RpcException {
    Result result = null;
    // 得到 “mock” 配置项，有多种配置方式
    String value = directory.getUrl().getMethodParameter(invocation.getMethodName(), Constants.MOCK_KEY, Boolean.FALSE.toString()).trim();
    //【第一种】无 mock
    if (value.length() == 0 || value.equalsIgnoreCase("false")) {
        //no mock
        // 调用原 Invoker ，发起 RPC 调用
        // 调用 invoker方法，进入到集群也就是CLuster类中
        //<2>
        result = this.invoker.invoke(invocation);
    //【第二种】强制服务降级
    } else if (value.startsWith("force")) {
        if (logger.isWarnEnabled()) {
            logger.warn("force-mock: " + invocation.getMethodName() + " force-mock enabled , url : " + directory.getUrl());
        }
        //force:direct mock
        // 直接调用 Mock Invoker ，执行本地 Mock 逻辑
        result = doMockInvoke(invocation, null);
    } else {
        //fail-mock
        try {
            // 【第三种】失败服务降级
            result = this.invoker.invoke(invocation);
        } catch (RpcException e) {
            // 业务性异常，直接抛出
            if (e.isBiz()) {
                throw e;
            } else {
                if (logger.isWarnEnabled()) {
                    logger.warn("fail-mock: " + invocation.getMethodName() + " fail-mock enabled , url : " + directory.getUrl(), e);
                }
                // 失败后，调用 Mock Invoker ，执行本地 Mock 逻辑
                result = doMockInvoke(invocation, e);
            }
        }
    }
    return result;
}

2 进入到 invoke就要开始进入到集群,也就是Cluster

/**
 * 调用服务提供者
 * @param invocation
 * @return
 * @throws RpcException
 */
@Override
public Result invoke(final Invocation invocation) throws RpcException {
    // 校验是否销毁
    checkWhetherDestroyed();

    //TODO
    // binding attachments into invocation.
    Map<String, String> contextAttachments = RpcContext.getContext().getAttachments();
    if (contextAttachments != null && contextAttachments.size() != 0) {
        ((RpcInvocation) invocation).addAttachments(contextAttachments);
    }

    // 得到全部服务提供者 Invoker 集合
    // <下面的list方法>
    List<Invoker<T>> invokers = list(invocation);
    // 得到 LoadBalance 对象
    LoadBalance loadbalance = initLoadBalance(invokers, invocation);
    // 设置调用编号，如果异步调用
    RpcUtils.attachInvocationIdIfAsync(getUrl(), invocation);
    // 执行调用
    return doInvoke(invocation, invokers, loadbalance);
}

/**
 *  得到全部服务提供者 Invoker 集合
 * @param invocation
 * @return
 * @throws RpcException
 */
protected List<Invoker<T>> list(Invocation invocation) throws RpcException {
    // 经过directory 进入到 AbstractDirectory 中选择 directory
    //<3 进入3 里面>
    return directory.list(invocation);
}

3 进入到 AbstractDirectory 进行 directory的选择

/**
 * 得到全部服务 Invoker 集合
 * @param invocation
 * @return Invoker 集合
 * @throws RpcException
 */
@Override
public List<Invoker<T>> list(Invocation invocation) throws RpcException {
    //当销毁时抛出异常
    if (destroyed) {
        throw new RpcException("Directory already destroyed .url: " + getUrl());
    }
    // 得到全部 Invoker 集合
    // <4 RegistryDirectory 选择 invoker>
    List<Invoker<T>> invokers = doList(invocation);
    //根据路由规则，筛选Invoker集合
    List<Router> localRouters = this.routers; // local reference 本地引用，避免并发问题
    if (localRouters != null && !localRouters.isEmpty()) {
        for (Router router : localRouters) {
            try {
                if (router.getUrl() == null || router.getUrl().getParameter(Constants.RUNTIME_KEY, false)) {
                  
                    invokers = router.route(invokers, getConsumerUrl(), invocation);
                }
            } catch (Throwable t) {
                logger.error("Failed to execute router: " + getUrl() + ", cause: " + t.getMessage(), t);
            }
        }
    }
      //< 6 获取 router 即将进入 MockInvokersSelector 类中>
    return invokers;
}

调用list方法会进入到 RegistryDirectory#doList

/**
 *  得到对应的 Invoker 集合。
 * @param invocation
 * @return
 */
@Override
public List<Invoker<T>> doList(Invocation invocation) {
    if (forbidden) {
        // 1. No service provider 2. Service providers are disabled
        throw new RpcException(RpcException.FORBIDDEN_EXCEPTION,
                "No provider available from registry " + getUrl().getAddress() + " for service " + getConsumerUrl().getServiceKey() + " on consumer " + NetUtils.getLocalHost()
                        + " use dubbo version " + Version.getVersion() + ", please check status of providers(disabled, not registered or in blacklist).");
    }
    List<Invoker<T>> invokers = null;
    //从methodInvokerMap中取出invokers
    Map<String, List<Invoker<T>>> localMethodInvokerMap = this.methodInvokerMap; // local reference
    // 得到 Invoker 集合
    if (localMethodInvokerMap != null && localMethodInvokerMap.size() > 0) {
        // 得到方法名、方法参数
        String methodName = RpcUtils.getMethodName(invocation);
        Object[] args = RpcUtils.getArguments(invocation);
        // 【第一】可根据第一个参数枚举路由
        if (args != null && args.length > 0 && args[0] != null
                && (args[0] instanceof String || args[0].getClass().isEnum())) {
            invokers = localMethodInvokerMap.get(methodName + "." + args[0]); // The routing can be enumerated according to the first parameter
        }
        // 【第二】根据方法名得到 Invoker 集合
        if (invokers == null) {
            invokers = localMethodInvokerMap.get(methodName);
        }
        // 【第三】使用全量 Invoker 集合。例如，`#$echo(name)` ，回声方法
        if (invokers == null) {
            invokers = localMethodInvokerMap.get(Constants.ANY_VALUE);
        }
        // 【第四】使用 `methodInvokerMap` 第一个 Invoker 集合。防护性编程。
        if (invokers == null) {
            Iterator<List<Invoker<T>>> iterator = localMethodInvokerMap.values().iterator();
            if (iterator.hasNext()) {
                invokers = iterator.next();
            }
        }
    }
    return invokers == null ? new ArrayList<Invoker<T>>(0) : invokers;
}

6 进入 MockInvokersSelector 类中根据路由规则拿到正常执行的invokers

/**
 * ，根据 "invocation.need.mock" 路由匹配对应类型的 Invoker 集合：
 * @param invokers Invoker 集合
 * @param url        refer url
 * @param invocation
 * @param <T>
 * @return
 * @throws RpcException
 */
@Override
public <T> List<Invoker<T>> route(final List<Invoker<T>> invokers,
                                  URL url, final Invocation invocation) throws RpcException {
    // 得到普通 Invoker 集合
    if (invocation.getAttachments() == null) {
        //<7> 拿到能正常执行的invokers,并将其返回.也就是序号7
        return getNormalInvokers(invokers);
    } else {
        // 得到 "invocation.need.mock" 配置项
        String value = invocation.getAttachments().get(Constants.INVOCATION_NEED_MOCK);
        // 得到普通 Invoker 集合
        if (value == null)
            return getNormalInvokers(invokers);
        // 得到 MockInvoker 集合
        else if (Boolean.TRUE.toString().equalsIgnoreCase(value)) {
            return getMockedInvokers(invokers);
        }
    }
    // 其它，不匹配，直接返回 `invokers` 集合
    return invokers;
}

<7> 拿到能正常执行的invokers,并将其返回

/**
 * 得到普通 Invoker 集合
 * @param invokers
 * @param <T>
 * @return
 */
private <T> List<Invoker<T>> getNormalInvokers(final List<Invoker<T>> invokers) {
    // 不包含 MockInvoker 的状况下，直接返回 `invokers` 集合
    if (!hasMockProviders(invokers)) {
        return invokers;
    } else {
        // 若包含 MockInvoker 的状况下，过滤掉 MockInvoker ，建立普通 Invoker 集合
        List<Invoker<T>> sInvokers = new ArrayList<Invoker<T>>(invokers.size());
        for (Invoker<T> invoker : invokers) {
            if (!invoker.getUrl().getProtocol().equals(Constants.MOCK_PROTOCOL)) {
                sInvokers.add(invoker);
            }
        }
        return sInvokers;
    }
}

8 拿到 invoker 返回到 AbstractClusterInvoker这个类

对于上面的这些步骤，主要用于作两件事

在Directory中找出本次集群中的所有invokers

在Router中,将上一步的所有invokers挑选出能正常执行的invokers

在时序图的序号5和序号7处，作了上诉的处理。

在有多个集群的状况下，并且两个集群都是正常的，那么到底须要执行哪一个？

AbstractClusterInvoker#invoke

/**
 * 调用服务提供者
 * @param invocation
 * @return
 * @throws RpcException
 */
@Override
public Result invoke(final Invocation invocation) throws RpcException {
    // 校验是否销毁
    checkWhetherDestroyed();

    //TODO
    // binding attachments into invocation.
    Map<String, String> contextAttachments = RpcContext.getContext().getAttachments();
    if (contextAttachments != null && contextAttachments.size() != 0) {
        ((RpcInvocation) invocation).addAttachments(contextAttachments);
    }

    // 得到全部服务提供者 Invoker 集合
    List<Invoker<T>> invokers = list(invocation);
    // 得到 LoadBalance 对象
    LoadBalance loadbalance = initLoadBalance(invokers, invocation);
    // 设置调用编号，如果异步调用
    RpcUtils.attachInvocationIdIfAsync(getUrl(), invocation);
    // 执行调用
    return doInvoke(invocation, invokers, loadbalance);
}

/**
 *  实现子 Cluster 的 Invoker 实现类的服务调用的差别逻辑，
 * @param invocation
 * @param invokers
 * @param loadbalance
 * @return
 * @throws RpcException
 */
 // 抽象方法，子类自行的实现  由于咱们使用的默认配置，因此 咱们将会是FailoverClusterInvoker 这个类
 //< 8 doInvoker 方法>
protected abstract Result doInvoke(Invocation invocation, List<Invoker<T>> invokers,
                                   LoadBalance loadbalance) throws RpcException;

9 进入到相应的集群容错方案类中由于咱们使用的默认配置，因此咱们将会是FailoverClusterInvoker 这个类

/**
 * 实际逻辑很简单：循环，查找一个 Invoker 对象，进行调用，直到成功
 * @param invocation
 * @param invokers
 * @param loadbalance
 * @return
 * @throws RpcException
 */
@Override
@SuppressWarnings({"unchecked", "rawtypes"})
public Result doInvoke(Invocation invocation, final List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
    List<Invoker<T>> copyinvokers = invokers;
    // 检查copyinvokers便可用Invoker集合是否为空，若是为空，那么抛出异常
    checkInvokers(copyinvokers, invocation);
    String methodName = RpcUtils.getMethodName(invocation);
    // 获得最大可调用次数：最大可重试次数+1，默认最大可重试次数Constants.DEFAULT_RETRIES=2
    int len = getUrl().getMethodParameter(methodName, Constants.RETRIES_KEY, Constants.DEFAULT_RETRIES) + 1;
    if (len <= 0) {
        len = 1;
    }

    // retry loop.
    // 保存最后一次调用的异常
    RpcException le = null; // last exception.
    List<Invoker<T>> invoked = new ArrayList<Invoker<T>>(copyinvokers.size()); // invoked invokers.
    Set<String> providers = new HashSet<String>(len);
    // failover机制核心实现：若是出现调用失败，那么重试其余服务器
    for (int i = 0; i < len; i++) {
        //Reselect before retry to avoid a change of candidate `invokers`.
        //NOTE: if `invokers` changed, then `invoked` also lose accuracy.
        // 重试时，进行从新选择，避免重试时invoker列表已发生变化.
        // 注意：若是列表发生了变化，那么invoked判断会失效，由于invoker示例已经改变
        if (i > 0) {
            checkWhetherDestroyed();
            // 根据Invocation调用信息从Directory中获取全部可用Invoker
            copyinvokers = list(invocation);
            // check again
            // 从新检查一下
            checkInvokers(copyinvokers, invocation);
        }
        // 根据负载均衡机制从copyinvokers中选择一个Invoker
         //< 9 select ------------------------- 下面将进行 invoker选择----------------------------------->
        Invoker<T> invoker = select(loadbalance, invocation, copyinvokers, invoked);
        // 保存每次调用的Invoker
        invoked.add(invoker);
        // 设置已经调用的 Invoker 集合，到 Context 中
        RpcContext.getContext().setInvokers((List) invoked);
        try {
            // RPC 调用获得 Result
            Result result = invoker.invoke(invocation);
            // 重试过程当中，将最后一次调用的异常信息以 warn 级别日志输出
            if (le != null && logger.isWarnEnabled()) {
                logger.warn("Although retry the method " + methodName
                        + " in the service " + getInterface().getName()
                        + " was successful by the provider " + invoker.getUrl().getAddress()
                        + ", but there have been failed providers " + providers
                        + " (" + providers.size() + "/" + copyinvokers.size()
                        + ") from the registry " + directory.getUrl().getAddress()
                        + " on the consumer " + NetUtils.getLocalHost()
                        + " using the dubbo version " + Version.getVersion() + ". Last error is: "
                        + le.getMessage(), le);
            }
            return result;
        } catch (RpcException e) {
            // 若是是业务性质的异常，再也不重试，直接抛出
            if (e.isBiz()) { // biz exception.
                throw e;
            }
            // 其余性质的异常统一封装成RpcException
            le = e;
        } catch (Throwable e) {
            le = new RpcException(e.getMessage(), e);
        } finally {
            providers.add(invoker.getUrl().getAddress());
        }
    }
    // 最大可调用次数用完还获得Result的话，抛出RpcException异常：重试了N次仍是失败，并输出最后一次异常信息
    throw new RpcException(le.getCode(), "Failed to invoke the method "
            + methodName + " in the service " + getInterface().getName()
            + ". Tried " + len + " times of the providers " + providers
            + " (" + providers.size() + "/" + copyinvokers.size()
            + ") from the registry " + directory.getUrl().getAddress()
            + " on the consumer " + NetUtils.getLocalHost() + " using the dubbo version "
            + Version.getVersion() + ". Last error is: "
            + le.getMessage(), le.getCause() != null ? le.getCause() : le);
}

9 select 使用 loadbalance 选择 invoker

/**
 * Select a invoker using loadbalance policy.</br>
 * a) Firstly, select an invoker using loadbalance. If this invoker is in previously selected list, or,
 * if this invoker is unavailable, then continue step b (reselect), otherwise return the first selected invoker</br>
 * <p>
 * b) Reselection, the validation rule for reselection: selected > available. This rule guarantees that
 * the selected invoker has the minimum chance to be one in the previously selected list, and also
 * guarantees this invoker is available.
 *
 * @param loadbalance load balance policy
 * @param invocation  invocation
 * @param invokers    invoker candidates
 * @param selected    exclude selected invokers or not
 * @return the invoker which will final to do invoke.
 * @throws RpcException
 */
/**
 * 使用 loadbalance 选择 invoker.
 * a)
 * @param loadbalance  对象，提供负责均衡策略
 * @param invocation 对象
 * @param invokers 候选的 Invoker 集合
 * @param selected 已选过的 Invoker 集合. 注意：输入保证不重复
 * @return 最终的 Invoker 对象
 * @throws RpcException
 */
protected Invoker<T> select(LoadBalance loadbalance, Invocation invocation, List<Invoker<T>> invokers, List<Invoker<T>> selected) throws RpcException {
    if (invokers == null || invokers.isEmpty())
        return null;
    // 得到 sticky 配置项，方法级
    String methodName = invocation == null ? "" : invocation.getMethodName();

    boolean sticky = invokers.get(0).getUrl().getMethodParameter(methodName, Constants.CLUSTER_STICKY_KEY, Constants.DEFAULT_CLUSTER_STICKY);
    {
        //ignore overloaded method
        // 若 stickyInvoker 不存在于 invokers 中，说明不在候选中，须要置空，从新选择
        if (stickyInvoker != null && !invokers.contains(stickyInvoker)) {
            stickyInvoker = null;
        }
        //ignore concurrency problem
        // 若开启粘滞链接的特性，且 stickyInvoker 不存在于 selected 中，则返回 stickyInvoker 这个 Invoker 对象
        if (sticky && stickyInvoker != null && (selected == null || !selected.contains(stickyInvoker))) {
            // 若开启排除非可用的 Invoker 的特性，则校验 stickyInvoker 是否可用。若可用，则进行返回
            if (availablecheck && stickyInvoker.isAvailable()) {
                return stickyInvoker;
            }
        }
    }
    // 执行选择
    //<  10 -----------------------------进行选择------------------------------------->
    Invoker<T> invoker = doSelect(loadbalance, invocation, invokers, selected);

    // 若开启粘滞链接的特性，记录最终选择的 Invoker 到 stickyInvoker
    if (sticky) {
        stickyInvoker = invoker;
    }
    return invoker;
}

10 doSelect 从候选的 Invoker 集合，选择一个最终调用的 Invoker 对象

/**
 *  从候选的 Invoker 集合，选择一个最终调用的 Invoker 对象
 * @param loadbalance
 * @param invocation
 * @param invokers
 * @param selected
 * @return
 * @throws RpcException
 */
private Invoker<T> doSelect(LoadBalance loadbalance, Invocation invocation, List<Invoker<T>> invokers, List<Invoker<T>> selected) throws RpcException {
    if (invokers == null || invokers.isEmpty())
        return null;
    // 若是只有一个 Invoker ，直接选择
    if (invokers.size() == 1)
        return invokers.get(0);
    // 使用 Loadbalance ，选择一个 Invoker 对象。
    //<11 ------------------ 根据LoadBalance(负载均衡) 选择一个合适的invoke>
    Invoker<T> invoker = loadbalance.select(invokers, getUrl(), invocation);

    //If the `invoker` is in the  `selected` or invoker is unavailable && availablecheck is true, reselect.
    // 若是 selected中包含（优先判断） 或者 不可用&&availablecheck=true 则重试.
    if ((selected != null && selected.contains(invoker))
            || (!invoker.isAvailable() && getUrl() != null && availablecheck)) {
        try {
            //重选一个 Invoker 对象
            Invoker<T> rinvoker = reselect(loadbalance, invocation, invokers, selected, availablecheck);
            if (rinvoker != null) {
                invoker = rinvoker;
            } else {
                //Check the index of current selected invoker, if it's not the last one, choose the one at index+1.
                //看下第一次选的位置，若是不是最后，选+1位置.
                int index = invokers.indexOf(invoker);
                try {
                    //Avoid collision
                    // 最后在避免碰撞
                    invoker = index < invokers.size() - 1 ? invokers.get(index + 1) : invokers.get(0);
                } catch (Exception e) {
                    logger.warn(e.getMessage() + " may because invokers list dynamic change, ignore.", e);
                }
            }
        } catch (Throwable t) {
            logger.error("cluster reselect fail reason is :" + t.getMessage() + " if can not solve, you can set cluster.availablecheck=false in url", t);
        }
    }
    return invoker;
}

11 AbstractLoadBalance#select

此方法是抽象方法，须要各类子类去实现

/**
 *  抽象方法，下面的实现类来实现这个选择invoker 的方法
 *  各个负载均衡的类自行实现提供自定义的负载均衡策略。
 * @param invokers
 * @param url
 * @param invocation
 * @param <T>
 * @return
 */
protected abstract <T> Invoker<T> doSelect(List<Invoker<T>> invokers, URL url, Invocation invocation);

12 RoundRobinLoadBalance 实现父类的抽象方法

@Override
protected <T> Invoker<T> doSelect(List<Invoker<T>> invokers, URL url, Invocation invocation) {
    String key = invokers.get(0).getUrl().getServiceKey() + "." + invocation.getMethodName();
    int length = invokers.size(); // Number of invokers
    int maxWeight = 0; // The maximum weight
    int minWeight = Integer.MAX_VALUE; // The minimum weight
    final LinkedHashMap<Invoker<T>, IntegerWrapper> invokerToWeightMap = new LinkedHashMap<Invoker<T>, IntegerWrapper>();
    int weightSum = 0;
    // 计算最小、最大权重，总的权重和。
    for (int i = 0; i < length; i++) {
        int weight = getWeight(invokers.get(i), invocation);
        maxWeight = Math.max(maxWeight, weight); // Choose the maximum weight
        minWeight = Math.min(minWeight, weight); // Choose the minimum weight
        if (weight > 0) {
            invokerToWeightMap.put(invokers.get(i), new IntegerWrapper(weight));
            weightSum += weight;
        }
    }
    // 计算最小、最大权重，总的权重和。
    AtomicPositiveInteger sequence = sequences.get(key);
    if (sequence == null) {
        sequences.putIfAbsent(key, new AtomicPositiveInteger());
        sequence = sequences.get(key);
    }
    // 得到当前顺序号，并递增 + 1
    int currentSequence = sequence.getAndIncrement();
    // 权重不相等，顺序根据权重分配
    if (maxWeight > 0 && minWeight < maxWeight) {
        int mod = currentSequence % weightSum;// 剩余权重
        for (int i = 0; i < maxWeight; i++) {// 循环最大权重
            for (Map.Entry<Invoker<T>, IntegerWrapper> each : invokerToWeightMap.entrySet()) {
                final Invoker<T> k = each.getKey();
                final IntegerWrapper v = each.getValue();
                // 剩余权重归 0 ，当前 Invoker 还有剩余权重，返回该 Invoker 对象
                if (mod == 0 && v.getValue() > 0) {
                    return k;
                }
                // 若 Invoker 还有权重值，扣除它( value )和剩余权重( mod )。
                if (v.getValue() > 0) {
                    v.decrement();
                    mod--;
                }
            }
        }
    }
    // 权重相等，平均顺序得到
    // Round robin
    //<13 --------------------------->
    return invokers.get(currentSequence % length);
}

14 FailoverClusterInvoker # doInvoke 方法

/**
 * 实际逻辑很简单：循环，查找一个 Invoker 对象，进行调用，直到成功
 * @param invocation
 * @param invokers
 * @param loadbalance
 * @return
 * @throws RpcException
 */
@Override
@SuppressWarnings({"unchecked", "rawtypes"})
public Result doInvoke(Invocation invocation, final List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
    List<Invoker<T>> copyinvokers = invokers;
    // 检查copyinvokers便可用Invoker集合是否为空，若是为空，那么抛出异常
    checkInvokers(copyinvokers, invocation);
    String methodName = RpcUtils.getMethodName(invocation);
    // 获得最大可调用次数：最大可重试次数+1，默认最大可重试次数Constants.DEFAULT_RETRIES=2
    int len = getUrl().getMethodParameter(methodName, Constants.RETRIES_KEY, Constants.DEFAULT_RETRIES) + 1;
    if (len <= 0) {
        len = 1;
    }

    // retry loop.
    // 保存最后一次调用的异常
    RpcException le = null; // last exception.
    List<Invoker<T>> invoked = new ArrayList<Invoker<T>>(copyinvokers.size()); // invoked invokers.
    Set<String> providers = new HashSet<String>(len);
    // failover机制核心实现：若是出现调用失败，那么重试其余服务器
    for (int i = 0; i < len; i++) {
        //Reselect before retry to avoid a change of candidate `invokers`.
        //NOTE: if `invokers` changed, then `invoked` also lose accuracy.
        // 重试时，进行从新选择，避免重试时invoker列表已发生变化.
        // 注意：若是列表发生了变化，那么invoked判断会失效，由于invoker示例已经改变
        if (i > 0) {
            checkWhetherDestroyed();
            // 根据Invocation调用信息从Directory中获取全部可用Invoker
            copyinvokers = list(invocation);
            // check again
            // 从新检查一下
            checkInvokers(copyinvokers, invocation);
        }
        // 根据负载均衡机制从copyinvokers中选择一个Invoker
        //< ------------------------- 下面将进行 invoker选择----------------------------------->
        Invoker<T> invoker = select(loadbalance, invocation, copyinvokers, invoked);
        // 保存每次调用的Invoker
        invoked.add(invoker);
        // 设置已经调用的 Invoker 集合，到 Context 中
        RpcContext.getContext().setInvokers((List) invoked);
        try {
            // RPC 调用获得 Result
            Result result = invoker.invoke(invocation);
            // 重试过程当中，将最后一次调用的异常信息以 warn 级别日志输出
            if (le != null && logger.isWarnEnabled()) {
                logger.warn("Although retry the method " + methodName
                        + " in the service " + getInterface().getName()
                        + " was successful by the provider " + invoker.getUrl().getAddress()
                        + ", but there have been failed providers " + providers
                        + " (" + providers.size() + "/" + copyinvokers.size()
                        + ") from the registry " + directory.getUrl().getAddress()
                        + " on the consumer " + NetUtils.getLocalHost()
                        + " using the dubbo version " + Version.getVersion() + ". Last error is: "
                        + le.getMessage(), le);
            }
            return result;
        } catch (RpcException e) {
            // 若是是业务性质的异常，再也不重试，直接抛出
            if (e.isBiz()) { // biz exception.
                throw e;
            }
            // 其余性质的异常统一封装成RpcException
            le = e;
        } catch (Throwable e) {
            le = new RpcException(e.getMessage(), e);
        } finally {
            providers.add(invoker.getUrl().getAddress());
        }
    }
    // 最大可调用次数用完还获得Result的话，抛出RpcException异常：重试了N次仍是失败，并输出最后一次异常信息
    throw new RpcException(le.getCode(), "Failed to invoke the method "
            + methodName + " in the service " + getInterface().getName()
            + ". Tried " + len + " times of the providers " + providers
            + " (" + providers.size() + "/" + copyinvokers.size()
            + ") from the registry " + directory.getUrl().getAddress()
            + " on the consumer " + NetUtils.getLocalHost() + " using the dubbo version "
            + Version.getVersion() + ". Last error is: "
            + le.getMessage(), le.getCause() != null ? le.getCause() : le);
}

以上是一个完成调用过程的源码分析以及架构分析

Dubbo集群容错源码解析

** 咱们后面的分析重点就是Cluster的调用过程**

Cluster的做用

Cluster 将 Directory 中的多个 Invoker 假装成一个 Invoker，对上层透明，假装过程包含了容错逻辑，调用失败后，重试另外一个

应对出错状况采起的策略,在某次出现错误后将会采用何种方式进行下次重试

集群模式归纳

下面对这些一个一个分析

Failover Cluster

失败自动切换，当出现失败，重试其它服务器 [1]。一般用于读操做，但重试会带来更长延迟。可经过 retries="2" 来设置重试次数(不含第一次)。

重试次数配置以下：

<dubbo:service retries="2" />

或

<dubbo:reference retries="2" />

或

<dubbo:reference>
    <dubbo:method name="findFoo" retries="2" />
</dubbo:reference>

核心调用类

/**
 * 实际逻辑很简单：循环，查找一个 Invoker 对象，进行调用，直到成功
 * @param invocation
 * @param invokers
 * @param loadbalance
 * @return
 * @throws RpcException
 */
@Override
@SuppressWarnings({"unchecked", "rawtypes"})
public Result doInvoke(Invocation invocation, final List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
    List<Invoker<T>> copyinvokers = invokers;
    // 检查copyinvokers便可用Invoker集合是否为空，若是为空，那么抛出异常
    checkInvokers(copyinvokers, invocation);
    String methodName = RpcUtils.getMethodName(invocation);
    // 获得最大可调用次数：最大可重试次数+1，默认最大可重试次数Constants.DEFAULT_RETRIES=2
    int len = getUrl().getMethodParameter(methodName, Constants.RETRIES_KEY, Constants.DEFAULT_RETRIES) + 1;
    if (len <= 0) {
        len = 1;
    }

    // retry loop.
    // 保存最后一次调用的异常
    RpcException le = null; // last exception.
    List<Invoker<T>> invoked = new ArrayList<Invoker<T>>(copyinvokers.size()); // invoked invokers.
    Set<String> providers = new HashSet<String>(len);
    // failover机制核心实现：若是出现调用失败，那么重试其余服务器
    for (int i = 0; i < len; i++) {
        //Reselect before retry to avoid a change of candidate `invokers`.
        //NOTE: if `invokers` changed, then `invoked` also lose accuracy.
        // 重试时，进行从新选择，避免重试时invoker列表已发生变化.
        // 注意：若是列表发生了变化，那么invoked判断会失效，由于invoker示例已经改变
        if (i > 0) {
            checkWhetherDestroyed();
            // 根据Invocation调用信息从Directory中获取全部可用Invoker
            copyinvokers = list(invocation);
            // check again
            // 从新检查一下
            checkInvokers(copyinvokers, invocation);
        }
        // 根据负载均衡机制从copyinvokers中选择一个Invoker
        //< ------------------------- 下面将进行 invoker选择----------------------------------->
        Invoker<T> invoker = select(loadbalance, invocation, copyinvokers, invoked);
        // 保存每次调用的Invoker
        invoked.add(invoker);
        // 设置已经调用的 Invoker 集合，到 Context 中
        RpcContext.getContext().setInvokers((List) invoked);
        try {
            // RPC 调用获得 Result
            Result result = invoker.invoke(invocation);
            // 重试过程当中，将最后一次调用的异常信息以 warn 级别日志输出
            if (le != null && logger.isWarnEnabled()) {
                logger.warn("Although retry the method " + methodName
                        + " in the service " + getInterface().getName()
                        + " was successful by the provider " + invoker.getUrl().getAddress()
                        + ", but there have been failed providers " + providers
                        + " (" + providers.size() + "/" + copyinvokers.size()
                        + ") from the registry " + directory.getUrl().getAddress()
                        + " on the consumer " + NetUtils.getLocalHost()
                        + " using the dubbo version " + Version.getVersion() + ". Last error is: "
                        + le.getMessage(), le);
            }
            return result;
        } catch (RpcException e) {
            // 若是是业务性质的异常，再也不重试，直接抛出
            if (e.isBiz()) { // biz exception.
                throw e;
            }
            // 其余性质的异常统一封装成RpcException
            le = e;
        } catch (Throwable e) {
            le = new RpcException(e.getMessage(), e);
        } finally {
            providers.add(invoker.getUrl().getAddress());
        }
    }
    // 最大可调用次数用完还获得Result的话，抛出RpcException异常：重试了N次仍是失败，并输出最后一次异常信息
    throw new RpcException(le.getCode(), "Failed to invoke the method "
            + methodName + " in the service " + getInterface().getName()
            + ". Tried " + len + " times of the providers " + providers
            + " (" + providers.size() + "/" + copyinvokers.size()
            + ") from the registry " + directory.getUrl().getAddress()
            + " on the consumer " + NetUtils.getLocalHost() + " using the dubbo version "
            + Version.getVersion() + ". Last error is: "
            + le.getMessage(), le.getCause() != null ? le.getCause() : le);
}

Failfast Cluster

快速失败，只发起一次调用，失败当即报错。一般用于非幂等性的写操做，好比新增记录。

public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
    // 检查 invokers集合
    checkInvokers(invokers, invocation);
    // 根据负载均衡机制从 invokers 中选择一个Invoker
    Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
    try {
        // RPC 调用获得 Result
        return invoker.invoke(invocation);
    } catch (Throwable e) {
        // 如果业务性质的异常，直接抛出
        if (e instanceof RpcException && ((RpcException) e).isBiz()) { // biz exception.
            throw (RpcException) e;
        }
        // 封装 RpcException 异常，并抛出
        throw new RpcException(e instanceof RpcException ? ((RpcException) e).getCode() : 0,
                "Failfast invoke providers " + invoker.getUrl() + " " + loadbalance.getClass().getSimpleName()
                        + " select from all providers " + invokers + " for service " + getInterface().getName()
                        + " method " + invocation.getMethodName() + " on consumer " + NetUtils.getLocalHost()
                        + " use dubbo version " + Version.getVersion()
                        + ", but no luck to perform the invocation. Last error is: " + e.getMessage(),
                e.getCause() != null ? e.getCause() : e);
    }
}

Failsafe Cluster

失败安全，出现异常时，直接忽略。一般用于写入审计日志等操做。

public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
    try {
        // 检查 invokers 是否为空
        checkInvokers(invokers, invocation);
        // 根据负载均衡机制从 invokers 中选择一个Invoker
        Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
        // RPC 调用获得 Result
        return invoker.invoke(invocation);
    } catch (Throwable e) {
        // 打印异常日志
        logger.error("Failsafe ignore exception: " + e.getMessage(), e);
        // 忽略异常
        return new RpcResult(); // ignore
    }
}

Failback Cluster

失败自动恢复，后台记录失败请求，定时重发。一般用于消息通知操做。

/**
 * 当失败的时候会将invocation 添加到失败集合中
 * @param invocation 失败的invocation
 * @param router 对象自身
 */
private void addFailed(Invocation invocation, AbstractClusterInvoker<?> router) {
    //若是定时任务未初始化，进行建立
    if (retryFuture == null) {
        synchronized (this) {
            if (retryFuture == null) {
                retryFuture = scheduledExecutorService.scheduleWithFixedDelay(new Runnable() {

                    @Override
                    public void run() {
                        // collect retry statistics
                        //建立的定时任务，会调用 #retryFailed() 方法，重试任务，发起 RCP 调用。
                        try {
                            retryFailed();
                        } catch (Throwable t) { // Defensive fault tolerance
                            logger.error("Unexpected error occur at collect statistic", t);
                        }
                    }
                }, RETRY_FAILED_PERIOD, RETRY_FAILED_PERIOD, TimeUnit.MILLISECONDS);
            }
        }
    }
    // 添加到失败任务
    failed.put(invocation, router);
}

/**
 * 重试任务，发起 RCP 调用
 */
void retryFailed() {
    if (failed.size() == 0) {
        return;
    }
    // 循环重试任务，逐个调用
    for (Map.Entry<Invocation, AbstractClusterInvoker<?>> entry : new HashMap<>(failed).entrySet()) {
        Invocation invocation = entry.getKey();
        Invoker<?> invoker = entry.getValue();
        try {
            // RPC 调用获得 Result
            invoker.invoke(invocation);
            // 移除失败任务
            failed.remove(invocation);
        } catch (Throwable e) {
            logger.error("Failed retry to invoke method " + invocation.getMethodName() + ", waiting again.", e);
        }
    }
}

@Override
protected Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
    try {
        checkInvokers(invokers, invocation);
        // 根据负载均衡机制从 invokers 中选择一个Invoker
        Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
        // RPC 调用获得 Result
        return invoker.invoke(invocation);
    } catch (Throwable e) {
        logger.error("Failback to invoke method " + invocation.getMethodName() + ", wait for retry in background. Ignored exception: "
                + e.getMessage() + ", ", e);
        // 添加到失败任务
        addFailed(invocation, this);
        return new RpcResult(); // ignore
    }
}

Forking Cluster

并行调用多个服务器，只要一个成功即返回。一般用于实时性要求较高的读操做，但须要浪费更多服务资源。可经过 forks="2" 来设置最大并行数。

public Result doInvoke(final Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
    try {
        // 检查 invokers 是否为空
        checkInvokers(invokers, invocation);
        // 保存选择的 Invoker 集合
        final List<Invoker<T>> selected;
        // 获得最大并行数，默认为 Constants.DEFAULT_FORKS = 2
        final int forks = getUrl().getParameter(Constants.FORKS_KEY, Constants.DEFAULT_FORKS);
        // 得到调用超时时间，默认为 DEFAULT_TIMEOUT = 1000 毫秒
        final int timeout = getUrl().getParameter(Constants.TIMEOUT_KEY, Constants.DEFAULT_TIMEOUT);
        //若是最大并行数小于 0 或者大于invokers的数量，直接调用invokers
        if (forks <= 0 || forks >= invokers.size()) {
            selected = invokers;
        } else {
            // 循环，根据负载均衡机制从 invokers，中选择一个个Invoker ，从而组成 Invoker 集合。
            // 注意，由于增长了排重逻辑，因此不能保证得到的 Invoker 集合的大小，小于最大并行数
            selected = new ArrayList<>();
            for (int i = 0; i < forks; i++) {
                // TODO. Add some comment here, refer chinese version for more details.
                // 在invoker列表(排除selected)后,若是没有选够,则存在重复循环问题.见select实现.
                Invoker<T> invoker = select(loadbalance, invocation, invokers, selected);
                if (!selected.contains(invoker)) {
                    //Avoid add the same invoker several times.
                    selected.add(invoker);
                }
            }
        }
        // 设置已经调用的 Invoker 集合，到 Context 中
        RpcContext.getContext().setInvokers((List) selected);
        // 异常计数器
        final AtomicInteger count = new AtomicInteger();
        // 建立阻塞队列
        final BlockingQueue<Object> ref = new LinkedBlockingQueue<>();
        // 循环 selected 集合，提交线程池，发起 RPC 调用
        for (final Invoker<T> invoker : selected) {
            executor.execute(new Runnable() {
                @Override
                public void run() {
                    try {
                        // RPC 调用，得到 Result 结果
                        Result result = invoker.invoke(invocation);
                        // 添加 Result 到 `ref` 阻塞队列
                        ref.offer(result);
                    } catch (Throwable e) {
                        // 异常计数器 + 1
                        int value = count.incrementAndGet();
                        // 若 RPC 调用结果都是异常，则添加异常到 `ref` 阻塞队列
                        if (value >= selected.size()) {
                            ref.offer(e);
                        }
                    }
                }
            });
        }
        try {
            // 从 `ref` 队列中，阻塞等待结果
            Object ret = ref.poll(timeout, TimeUnit.MILLISECONDS);
            // 如果异常结果，抛出 RpcException 异常
            if (ret instanceof Throwable) {
                Throwable e = (Throwable) ret;
                throw new RpcException(e instanceof RpcException ? ((RpcException) e).getCode() : 0, "Failed to forking invoke provider " + selected + ", but no luck to perform the invocation. Last error is: " + e.getMessage(), e.getCause() != null ? e.getCause() : e);
            }
            // 如果正常结果，直接返回
            return (Result) ret;
        } catch (InterruptedException e) {
            throw new RpcException("Failed to forking invoke provider " + selected + ", but no luck to perform the invocation. Last error is: " + e.getMessage(), e);
        }
    } finally {
        // clear attachments which is binding to current thread.
        RpcContext.getContext().clearAttachments();
    }
}

Broadcast Cluster

广播调用全部提供者，逐个调用，任意一台报错则报错 [2]。一般用于通知全部提供者更新缓存或日志等本地资源信息

public Result doInvoke(final Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
    // 检查 invokers 便可用Invoker集合是否为空，若是为空，那么抛出异常
    checkInvokers(invokers, invocation);
    // 设置已经调用的 Invoker 集合，到 Context 中
    RpcContext.getContext().setInvokers((List) invokers);
    // 保存最后一次调用的异常
    RpcException exception = null;
    // 保存最后一次调用的结果
    Result result = null;
    // 循环候选的 Invoker 集合，调用全部 Invoker 对象。
    for (Invoker<T> invoker : invokers) {
        try {
            // 发起 RPC 调用
            result = invoker.invoke(invocation);
        } catch (RpcException e) {
            exception = e;
            logger.warn(e.getMessage(), e);
        } catch (Throwable e) {
            // 封装成 RpcException 异常
            exception = new RpcException(e.getMessage(), e);
            logger.warn(e.getMessage(), e);
        }
    }
    // 若存在一个异常，抛出该异常
    if (exception != null) {
        throw exception;
    }
    return result;
}

Available Cluster

遍历全部从Directory中list出来的Invoker集合，调用第一个isAvailable()的Invoker，只发起一次调用，失败当即报错。
isAvailable()判断逻辑以下--Client处理链接状态，且不是READONLY：

@Override
public boolean isAvailable() {
    if (!super.isAvailable())
        return false;
    for (ExchangeClient client : clients){
        if (client.isConnected() && !client.hasAttribute(Constants.CHANNEL_ATTRIBUTE_READONLY_KEY)){
            //cannot write == not Available ?
            return true ;
        }
    }
    return false;
}

参考文章

dubbo源码解析-集群容错架构设计

Dubbo 官方文档集群容错

dubbo源码解析-cluster

13.dubbo源码-集群容错

dubbo源码解析五 --- 集群容错架构设计与原理分析

欢迎来个人 Star Followers 后期后继续更新Dubbo别的文章

目录

面试中集群容错的常常的问题

Dubbo 官方文档关于集群容错的介绍

集群容错模式

集群模式配置

Dubbo集群容错的架构分析

Dubbo集群容错源码解析