3. Sentinel源码分析— QPS流量控制是如何实现的？

时间 2019-11-09

标签 sentinel 源码分析 qps 流量控制如何实现繁體版

原文原文链接

终于在这周内写了一篇源码解析，每周一篇即便再忙也不能打破html

Sentinel源码解析系列：java

1.Sentinel源码分析—FlowRuleManager加载规则作了什么？node

2. Sentinel源码分析—Sentinel是如何进行流量统计的？api

上回咱们用基于并发数来说了一下Sentinel的整个流程，这篇文章咱们来说一下Sentinel的QPS流量控制是如何实现的。bash

先上一个极简的demo，咱们的代码就从这个demo入手：并发

public static void main(String[] args) {
    List<FlowRule> rules = new ArrayList<FlowRule>();
    FlowRule rule1 = new FlowRule();
    rule1.setResource("abc"); 
    rule1.setCount(20);
    rule1.setGrade(RuleConstant.FLOW_GRADE_QPS);
    rule1.setLimitApp("default");
    rules.add(rule1);
    FlowRuleManager.loadRules(rules);

    Entry entry = null;

    try {
        entry = SphU.entry("abc");
        //dosomething 
    } catch (BlockException e1) {

    } catch (Exception e2) {
        // biz exception
    } finally {
        if (entry != null) {
            entry.exit();
        }
    }
}
复制代码

在这个例子中咱们首先新建了一个FlowRule实例，而后调用了loadRules方法加载规则，这部分的代码都和基于并发数的流量控制的代码是同样的，想要了解的朋友能够去看看个人这一篇文章1.Sentinel源码分析—FlowRuleManager加载规则作了什么？，下面咱们说说不同的地方。app

在调用FlowRuleManager的loadRules方法的时候会建立一个rater实例：less

FlowRuleUtil#buildFlowRuleMapide

//设置拒绝策略：直接拒绝、Warm Up、匀速排队，默认是DefaultController
TrafficShapingController rater = generateRater(rule);
rule.setRater(rater);
复制代码

咱们进入到generateRater看一下是怎么建立实例的：源码分析

FlowRuleUtil#generateRater

private static TrafficShapingController generateRater(/*@Valid*/ FlowRule rule) {
    if (rule.getGrade() == RuleConstant.FLOW_GRADE_QPS) {
        switch (rule.getControlBehavior()) {
            case RuleConstant.CONTROL_BEHAVIOR_WARM_UP:
                //warmUpPeriodSec默认是10 
                return new WarmUpController(rule.getCount(), rule.getWarmUpPeriodSec(),
                    ColdFactorProperty.coldFactor);
            case RuleConstant.CONTROL_BEHAVIOR_RATE_LIMITER:
                //rule.getMaxQueueingTimeMs()默认是500
                return new RateLimiterController(rule.getMaxQueueingTimeMs(), rule.getCount());
            case RuleConstant.CONTROL_BEHAVIOR_WARM_UP_RATE_LIMITER:
                return new WarmUpRateLimiterController(rule.getCount(), rule.getWarmUpPeriodSec(),
                    rule.getMaxQueueingTimeMs(), ColdFactorProperty.coldFactor);
            case RuleConstant.CONTROL_BEHAVIOR_DEFAULT:
            default:
                // Default mode or unknown mode: default traffic shaping controller (fast-reject).
        }
    }
    return new DefaultController(rule.getCount(), rule.getGrade());
}
复制代码

这个方法里面若是设置的是按QPS的方式来限流的话，能够设置一个ControlBehavior属性，用来作流量控制分别是：直接拒绝、Warm Up、匀速排队。

接下来的全部的限流操做所有在FlowSlot中进行，不熟悉Sentinel流程的朋友能够去看看个人这一篇文章：2. Sentinel源码分析—Sentinel是如何进行流量统计的？，这篇文章介绍了Sentinel的全流程分析，本文的其余流程基本都在这篇文章上讲了，只有FlowSlot部分代码不一样。

接下来咱们来说一下FlowSlot里面是怎么实现QPS限流的

FlowSlot#entry

public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node, int count,
                  boolean prioritized, Object... args) throws Throwable {
    checkFlow(resourceWrapper, context, node, count, prioritized);

    fireEntry(context, resourceWrapper, node, count, prioritized, args);
}

void checkFlow(ResourceWrapper resource, Context context, DefaultNode node, int count, boolean prioritized)
    throws BlockException {
    checker.checkFlow(ruleProvider, resource, context, node, count, prioritized);
}
复制代码

FlowSlot在实例化的时候会实例化一个FlowRuleChecker实例做为checker。在checkFlow方法里面会继续调用FlowRuleChecker的checkFlow方法，其中ruleProvider实例是用来根据根据resource来从flowRules中获取相应的FlowRule。

咱们进入到FlowRuleChecker的checkFlow方法中

FlowRuleChecker#checkFlow

public void checkFlow(Function<String, Collection<FlowRule>> ruleProvider, ResourceWrapper resource, Context context, DefaultNode node, int count, boolean prioritized) throws BlockException {
    if (ruleProvider == null || resource == null) {
        return;
    }
    //返回FlowRuleManager里面注册的全部规则
    Collection<FlowRule> rules = ruleProvider.apply(resource.getName());
    if (rules != null) {
        for (FlowRule rule : rules) {
            //若是当前的请求不能经过，那么就抛出FlowException异常
            if (!canPassCheck(rule, context, node, count, prioritized)) {
                throw new FlowException(rule.getLimitApp(), rule);
            }
        }
    }
}
复制代码

这里是调用ruleProvider来获取全部FlowRule，而后遍历rule集合经过canPassCheck方法来进行过滤，若是不符合条件则会抛出FlowException异常。

咱们跟进去直接来到passLocalCheck方法：

private static boolean passLocalCheck(FlowRule rule, Context context, DefaultNode node, int acquireCount, boolean prioritized) {
    //节点选择
    Node selectedNode = selectNodeByRequesterAndStrategy(rule, context, node);
    if (selectedNode == null) {
        return true;
    }
    //根据设置的规则来拦截
    return rule.getRater().canPass(selectedNode, acquireCount, prioritized);
}
复制代码

这个方法里面会选择好相应的节点后调用rater的canPass方法来判断是否须要阻塞。

Rater有四个，分别是：DefaultController、RateLimiterController、WarmUpController、WarmUpRateLimiterController，咱们挨个分析一下。

其中DefaultController是直接拒绝策略，咱们在上一篇文章中已经分析过了，此次咱们来看看其余三个。

RateLimiterController匀速排队

它的中心思想是，以固定的间隔时间让请求经过。当请求到来的时候，若是当前请求距离上个经过的请求经过的时间间隔不小于预设值，则让当前请求经过；不然，计算当前请求的预期经过时间，若是该请求的预期经过时间小于规则预设的 timeout 时间，则该请求会等待直到预设时间到来经过（排队等待处理）；若预期的经过时间超出最大排队时长，则直接拒接这个请求。

这种方式适合用于请求以突刺状来到，这个时候咱们不但愿一会儿把全部的请求都经过，这样可能会把系统压垮；同时咱们也期待系统以稳定的速度，逐步处理这些请求，以起到“削峰填谷”的效果，而不是拒绝全部请求。

要想使用这个策略须要在实例化FlowRule的时候设置rule1.setControlBehavior(RuleConstant.CONTROL_BEHAVIOR_RATE_LIMITER)这样的一句代码。

在实例化Rater的时候会调用FlowRuleUtil#generateRateri建立一个实例：

new RateLimiterController(rule.getMaxQueueingTimeMs(), rule.getCount());
复制代码

MaxQueueingTimeMs默认是500 ，Count在咱们这个例子中传入的是20。

咱们看一下具体的canPass方法是怎么实现限流的：

public boolean canPass(Node node, int acquireCount, boolean prioritized) {
    // Pass when acquire count is less or equal than 0.
    if (acquireCount <= 0) {
        return true;
    }
    // Reject when count is less or equal than 0.
    // Otherwise,the costTime will be max of long and waitTime will overflow in some cases.
    if (count <= 0) {
        return false;
    }

    long currentTime = TimeUtil.currentTimeMillis();
    //两个请求预期经过的时间,也就是说把请求平均分配到1秒上
    // Calculate the interval between every two requests.
    long costTime = Math.round(1.0 * (acquireCount) / count * 1000);

    //latestPassedTime表明的是上一次调用请求的时间
    // Expected pass time of this request.
    long expectedTime = costTime + latestPassedTime.get();
    //若是预期经过的时间加上上次的请求时间小于当前时间，则经过
    if (expectedTime <= currentTime) {
        // Contention may exist here, but it's okay.
        latestPassedTime.set(currentTime);
        return true;
    } else {
        //默认是maxQueueingTimeMs
        // Calculate the time to wait.
        long waitTime = costTime + latestPassedTime.get() - TimeUtil.currentTimeMillis();

        //若是预提时间比当前时间大maxQueueingTimeMs那么多，那么就阻塞
        if (waitTime > maxQueueingTimeMs) {
            return false;
        } else {
            //将上次时间加上此次请求要耗费的时间
            long oldTime = latestPassedTime.addAndGet(costTime);
            try {
                waitTime = oldTime - TimeUtil.currentTimeMillis();
                //再次判断一下是否超过maxQueueingTimeMs设置的时间
                if (waitTime > maxQueueingTimeMs) {
                    //若是是的话就阻塞，并重置上次经过时间
                    latestPassedTime.addAndGet(-costTime);
                    return false;
                }
                //若是须要等待的时间大于零，那么就sleep
                // in race condition waitTime may <= 0
                if (waitTime > 0) {
                    Thread.sleep(waitTime);
                }
                return true;
            } catch (InterruptedException e) {
            }
        }
    }
    return false;
}
复制代码

这个方法一开始会计算一下costTime这个值，将请求平均分配到一秒中。例如：当 count 设为 10 的时候，则表明一秒匀速的经过 10 个请求，也就是每一个请求平均间隔恒定为 1000 / 10 = 100 ms。

可是这里有个小bug，若是count设置的比较大，好比设置成10000，那么costTime永远都会等于0，整个QPS限流将会失效。

而后会将costTime和上次的请求时间相加，若是大于当前时间就代表请求的太频繁了，会将latestPassedTime这个属性加上此次请求的costTime，并调用sleep方法让这个线程先睡眠一会再请求。

这里有个细节，若是多个请求同时一块儿过来，那么每一个请求在设置oldTime的时候都会经过addAndGet这个原子性的方法将latestPassedTime依次相加，并赋值给oldTime，这样每一个线程的sleep的时间都不会相同，线程也不会同时醒来。

WarmUpController限流冷启动

当系统长期处于低水位的状况下，当流量忽然增长时，直接把系统拉升到高水位可能瞬间把系统压垮。经过"冷启动"，让经过的流量缓慢增长，在必定时间内逐渐增长到阈值上限，给冷系统一个预热的时间，避免冷系统被压垮。

//默认为3
private int coldFactor;
//转折点的令牌数
protected int warningToken = 0;
//最大的令牌数
private int maxToken;
//斜线斜率
protected double slope;
//累积的令牌数
protected AtomicLong storedTokens = new AtomicLong(0);
//最后更新令牌的时间
protected AtomicLong lastFilledTime = new AtomicLong(0);

public WarmUpController(double count, int warmUpPeriodInSec, int coldFactor) {
    construct(count, warmUpPeriodInSec, coldFactor);
}

private void construct(double count, int warmUpPeriodInSec, int coldFactor) {

    if (coldFactor <= 1) {
        throw new IllegalArgumentException("Cold factor should be larger than 1");
    }

    this.count = count;
    //默认是3
    this.coldFactor = coldFactor;

    // thresholdPermits = 0.5 * warmupPeriod / stableInterval.
    // 10*20/2 = 100
    // warningToken = 100;
    warningToken = (int) (warmUpPeriodInSec * count) / (coldFactor - 1);
    // / maxPermits = thresholdPermits + 2 * warmupPeriod /
    // (stableInterval + coldInterval)
    // maxToken = 200
    maxToken = warningToken + (int) (2 * warmUpPeriodInSec * count / (1.0 + coldFactor));

    // slope
    // slope = (coldIntervalMicros - stableIntervalMicros) / (maxPermits
    // - thresholdPermits);
    slope = (coldFactor - 1.0) / count / (maxToken - warningToken);
}
复制代码

这里我拿一张图来讲明一下：

X 轴表明 storedPermits 的数量，Y 轴表明获取一个 permits 须要的时间。

假设指定 permitsPerSecond 为 10，那么 stableInterval 为 100ms，而 coldInterval 是 3 倍，也就是 300ms（coldFactor，3 倍）。也就是说，当达到 maxPermits 时，此时处于系统最冷的时候，获取一个 permit 须要 300ms，而若是 storedPermits 小于 thresholdPermits 的时候，只须要 100ms。

利用 “获取冷的 permits ” 须要等待更多时间，来限制突发请求经过，达到系统预热的目的。

因此在咱们的代码中，maxToken表明的就是图中的maxPermits，warningToken表明的就是thresholdPermits，slope就是表明每次获取permit减小的程度。

咱们接下来看看WarmUpController的canpass方法：

WarmUpController#canpass

public boolean canPass(Node node, int acquireCount, boolean prioritized) {
    //获取当前时间窗口的流量大小
    long passQps = (long) node.passQps();
    //获取上一个窗口的流量大小
    long previousQps = (long) node.previousPassQps();
    //设置 storedTokens 和 lastFilledTime 到正确的值
    syncToken(previousQps);

    // 开始计算它的斜率
    // 若是进入了警惕线，开始调整他的qps
    long restToken = storedTokens.get();
    if (restToken >= warningToken) {
        //经过计算当前的restToken和警惕线的距离来计算当前的QPS
        //离警惕线越接近，表明这个程序越“热”，从而逐步释放QPS
        long aboveToken = restToken - warningToken;
        //当前状态下能达到的最高 QPS
        // current interval = restToken*slope+1/count
        double warningQps = Math.nextUp(1.0 / (aboveToken * slope + 1.0 / count));

        // 若是不会超过，那么经过，不然不经过
        if (passQps + acquireCount <= warningQps) {
            return true;
        }
    } else {
        // count 是最高能达到的 QPS
        if (passQps + acquireCount <= count) {
            return true;
        }
    }
    return false;
}
复制代码

这个方法里经过syncToken(previousQps)设置storedTokens的值后，与警惕值作判断，若是没有达到警惕值，那么经过计算和警惕值的距离再加上slope计算出一个当前的QPS值，storedTokens越大当前的QPS越小。

若是当前的storedTokens已经小于警惕值了，说明已经预热完毕了，直接用count判断就行了。

WarmUpController#syncToken

protected void syncToken(long passQps) {
    long currentTime = TimeUtil.currentTimeMillis();
    //去掉毫秒的时间
    currentTime = currentTime - currentTime % 1000;
    long oldLastFillTime = lastFilledTime.get();
    if (currentTime <= oldLastFillTime) {
        return;
    }

    // 令牌数量的旧值
    long oldValue = storedTokens.get();
    // 计算新的令牌数量，往下看
    long newValue = coolDownTokens(currentTime, passQps);

    if (storedTokens.compareAndSet(oldValue, newValue)) {
        // 令牌数量上，减去上一分钟的 QPS，而后设置新值
        long currentValue = storedTokens.addAndGet(0 - passQps);
        if (currentValue < 0) {
            storedTokens.set(0L);
        }
        lastFilledTime.set(currentTime);
    } 
}
复制代码

这个方法经过coolDownTokens方法来获取一个新的value，而后经过CAS设置到storedTokens中，而后将storedTokens减去上一个窗口的QPS值，并为lastFilledTime设置一个新的值。

其实我这里有个疑惑，在用storedTokens减去上一个窗口的QPS的时候并无作控制，假如处理的速度很是的快，在一个窗口内就减了不少次，直接把当前的storedTokens减到了小于warningToken，那么是否是就没有在必定的时间范围内启动冷启动的效果？

private long coolDownTokens(long currentTime, long passQps) {
    long oldValue = storedTokens.get();
    long newValue = oldValue;

    // 添加令牌的判断前提条件:
    // 当令牌的消耗程度远远低于警惕线的时候
    if (oldValue < warningToken) {
        // 根据count数每秒加上令牌
        newValue = (long) (oldValue + (currentTime - lastFilledTime.get()) * count / 1000);
    } else if (oldValue > warningToken) {
        //若是还在冷启动阶段
        // 若是当前经过的 QPS 大于 count/coldFactor，说明系统消耗令牌的速度，大于冷却速度
        // 那么不须要添加令牌，不然须要添加令牌
        if (passQps < (int) count / coldFactor) {
            newValue = (long) (oldValue + (currentTime - lastFilledTime.get()) * count / 1000);
        }
    }
    return Math.min(newValue, maxToken);
}
复制代码

这个方法主要是用来作添加令牌的操做，若是是流量比较小或者是已经预热完毕了，那么就须要根据count数每秒加上令牌，若是是在预热阶段那么就不进行令牌添加。

WarmUpRateLimiterController就是结合了冷启动和匀速排队，代码很是的简单，有了上面的分析，相信你们也能看得懂，因此也就不讲解了。

3. Sentinel源码分析— QPS流量控制是如何实现的？

RateLimiterController匀速排队

WarmUpController限流 冷启动

WarmUpController限流冷启动