dubbo集群容错策略的代码分析2

时间 2019-11-15

标签 dubbo 集群容错策略代码分析栏目 Dubbo 繁體版

原文原文链接

接上篇http://www.javashuo.com/article/p-dwqnopaq-hc.htmljava

dubbo版本2.5.3缓存

dubbo自己集群容错策略有7种。都实现了Cluster接口（spi扩展点）
从类结构上看，Cluster接口有9个实现类（其中MockClusterWrapper是服务降级处理用的，MergeableCluster是分组合并结果用的）安全

Cluster接口只有一个方法服务器

/**
     * Merge the directory invokers to a virtual invoker.
     *
     * @param <T>
     * @param directory
     * @return cluster invoker
     * @throws RpcException
     */
    @Adaptive
    <T> Invoker<T> join(Directory<T> directory) throws RpcException;

方法实现逻辑是，把directory目录服务中多个提供者，通过容错和负载均衡机制包装，以一个虚拟的Invoker返给上层传调用。
每一个虚拟的Invoker类型，就是一种集群策略。并发

好比dubbo默认的集群策略failover类的实现app

public class FailoverCluster implements Cluster {

    public final static String NAME = "failover";

    public <T> Invoker<T> join(Directory<T> directory) throws RpcException {
        return new FailoverClusterInvoker<T>(directory);//包装成的虚拟Invoker类型是FailoverClusterInvoker，就是个集群容错策略
    }

}

FailoverClusterInvoker 类扩展了抽象类AbstractClusterInvoker，实现了AbstractClusterInvoker的
抽象方法doInvoke()用于实现具体集群策略，以下图负载均衡

AbstractClusterInvoker实现了Invoker接口惟一方法invoke,对外层调用，以下ide

public Result invoke(final Invocation invocation) throws RpcException {

        checkWhetherDestroyed();

        LoadBalance loadbalance;
        //从目录中获取全部的服务提供者
        List<Invoker<T>> invokers = list(invocation);
	//获取负载均衡策略
        if (invokers != null && invokers.size() > 0) {
            loadbalance = ExtensionLoader.getExtensionLoader(LoadBalance.class).getExtension(invokers.get(0).getUrl()
                    .getMethodParameter(invocation.getMethodName(), Constants.LOADBALANCE_KEY, Constants.DEFAULT_LOADBALANCE));
        } else {
            loadbalance = ExtensionLoader.getExtensionLoader(LoadBalance.class).getExtension(Constants.DEFAULT_LOADBALANCE);
        }
        RpcUtils.attachInvocationIdIfAsync(getUrl(), invocation);
        return doInvoke(invocation, invokers, loadbalance);//调用子类实现具体的容错策略。
    }

能够看到其余几种集群策略都是这种方式。其实就是模板方法模式。this

因此，经过看每种集群容错类的doInvoke方法的具体实现，就能够理解每种的容错策略。
前一篇，看了failover和available集群策略,下面再看看其余五种集群策略。.net

broadcast策略：

public Result doInvoke(final Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        checkInvokers(invokers, invocation);
        RpcContext.getContext().setInvokers((List) invokers);
        RpcException exception = null;
        Result result = null;
        //遍历调用全部的服务列表，并把结果覆盖之前的。
        for (Invoker<T> invoker : invokers) {
            try {
                result = invoker.invoke(invocation);
            } catch (RpcException e) {
                exception = e;
                logger.warn(e.getMessage(), e);
            } catch (Throwable e) {
                exception = new RpcException(e.getMessage(), e);
                logger.warn(e.getMessage(), e);
            }
        }
        //其中有一个失败，直接抛异常
        if (exception != null) {
            throw exception;
        }
        return result;
    }

这个策略一般用于通知全部提供者更新缓存或日志等本地资源信息

forking 策略：

public Result doInvoke(final Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        checkInvokers(invokers, invocation);
        final List<Invoker<T>> selected;
        //获取并行调用个数
        final int forks = getUrl().getParameter(Constants.FORKS_KEY, Constants.DEFAULT_FORKS);
        //超时时间
        final int timeout = getUrl().getParameter(Constants.TIMEOUT_KEY, Constants.DEFAULT_TIMEOUT);
        if (forks <= 0 || forks >= invokers.size()) {
            selected = invokers;
        } else {
            selected = new ArrayList<Invoker<T>>();
            //经过负载均衡策略，选出要并行调用的invokers，放入selected列表
            for (int i = 0; i < forks; i++) {
                //在invoker列表(排除selected)后,若是没有选够,则存在重复循环问题.见select实现.
                Invoker<T> invoker = select(loadbalance, invocation, invokers, selected);
                if (!selected.contains(invoker)) {//防止重复添加invoker
                    selected.add(invoker);
                }
            }
        }
        RpcContext.getContext().setInvokers((List) selected);
        final AtomicInteger count = new AtomicInteger();
        final BlockingQueue<Object> ref = new LinkedBlockingQueue<Object>();
        //遍历selected列表，经过线程池并发调用
        for (final Invoker<T> invoker : selected) {
            executor.execute(new Runnable() {
                public void run() {
                    try {
                        Result result = invoker.invoke(invocation);
                        //把结果放入队列
                        ref.offer(result);
                    } catch (Throwable e) {
                        int value = count.incrementAndGet();
                        //全部的都异常了，才把异常加入到对了尾部
                        //这就保证了，只要有一个成功，ref.poll()方法从队列头部就能取获得结果返回。
                        if (value >= selected.size()) {
                            ref.offer(e);
                        }
                    }
                }
            });
        }
        try {
            //从队列头部就能取获得结果,返回，若是是异常，就抛出。
            Object ret = ref.poll(timeout, TimeUnit.MILLISECONDS);
            if (ret instanceof Throwable) {
                Throwable e = (Throwable) ret;
                throw new RpcException(e instanceof RpcException ? ((RpcException) e).getCode() : 0, "Failed to forking invoke provider " + selected + ", but no luck to perform the invocation. Last error is: " + e.getMessage(), e.getCause() != null ? e.getCause() : e);
            }
            return (Result) ret;
        } catch (InterruptedException e) {
            throw new RpcException("Failed to forking invoke provider " + selected + ", but no luck to perform the invocation. Last error is: " + e.getMessage(), e);
        }
    }

并行调用多个服务器，只要一个成功即返回。一般用于实时性要求较高的读操做，但须要浪
费更多服务资源。可经过 forks="2" 来设置最大并行数。

failback策略：

protected Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        try {
            checkInvokers(invokers, invocation);
            //经过负载均衡策略选择一个invoker
            Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
            return invoker.invoke(invocation);
        } catch (Throwable e) {
            //失败后，记录异常日志，不抛出异常，
            logger.error("Failback to invoke method " + invocation.getMethodName() + ", wait for retry in background. Ignored exception: "
                    + e.getMessage() + ", ", e);
            //把异常调用记录入异常hashmap,key是调用的方法信息，value是invoker自己
            addFailed(invocation, this);
            return new RpcResult(); // ignore
        }
    }


    /***
     * 建立调度器，放入重试对象
     */
     private void addFailed(Invocation invocation, AbstractClusterInvoker<?> router) {
        if (retryFuture == null) {
            synchronized (this) {
                if (retryFuture == null) {
                    //调度线程池，周期性（5秒一次）的调用retryFailed方法
                    retryFuture = scheduledExecutorService.scheduleWithFixedDelay(new Runnable() {

                        public void run() {
                            // 收集统计信息
                            try {
                                //执行以前异常方法的调用
                                retryFailed();
                            } catch (Throwable t) { // 防护性容错
                                logger.error("Unexpected error occur at collect statistic", t);
                            }
                        }
                    }, RETRY_FAILED_PERIOD, RETRY_FAILED_PERIOD, TimeUnit.MILLISECONDS);
                }
            }
        }
	//放入map
        failed.put(invocation, router);
    }

    /***
     * 遍历失败hashmap failed 取出调用环境栈，执行调用
     */
    void retryFailed() {
        if (failed.size() == 0) {
            return;
        }
        for (Map.Entry<Invocation, AbstractClusterInvoker<?>> entry : new HashMap<Invocation, AbstractClusterInvoker<?>>(
                failed).entrySet()) {
            Invocation invocation = entry.getKey();
            Invoker<?> invoker = entry.getValue();
            try {
	    //执行调用
                invoker.invoke(invocation);
                failed.remove(invocation);
            } catch (Throwable e) {
                logger.error("Failed retry to invoke method " + invocation.getMethodName() + ", waiting again.", e);
            }
        }
    }

此策略失败自动恢复，后台记录失败请求，定时重发。一般用于消息通知操做。

failsafe策略：

public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        try {
            checkInvokers(invokers, invocation);
	    //利用负载均衡选择一个调用者
            Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
            return invoker.invoke(invocation);
        } catch (Throwable e) {
	   //若是有异常，记录异常信息，返回空值，不抛出异常
            logger.error("Failsafe ignore exception: " + e.getMessage(), e);
            return new RpcResult(); // ignore
        }
    }

失败安全，出现异常时，直接忽略。一般用于写入审计日志等操做。

failfast策略

public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers, LoadBalance loadbalance) throws RpcException {
        checkInvokers(invokers, invocation);
        Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
        try {
            return invoker.invoke(invocation);
        } catch (Throwable e) {
	  //若是有一次异常，当即抛出异常
            if (e instanceof RpcException && ((RpcException) e).isBiz()) { // biz exception.
                throw (RpcException) e;
            }
            throw new RpcException(e instanceof RpcException ? ((RpcException) e).getCode() : 0, "Failfast invoke providers " + invoker.getUrl() + " " + loadbalance.getClass().getSimpleName() + " select from all providers " + invokers + " for service " + getInterface().getName() + " method " + invocation.getMethodName() + " on consumer " + NetUtils.getLocalHost() + " use dubbo version " + Version.getVersion() + ", but no luck to perform the invocation. Last error is: " + e.getMessage(), e.getCause() != null ? e.getCause() : e);
        }
    }

快速失败，只发起一次调用，失败当即报错。一般用于非幂等性的写操做，好比新增记录。最后有个终结篇http://www.javashuo.com/article/p-poudvoae-u.html