缓存

时间 2019-12-05

标签缓存繁體版

原文原文链接

1. 缓存技术

1.1 Guava Cache

Guava Cache是一个全内存的本地缓存实现，它提供了线程安全的实现机制。html

Guava Cache有两种建立方式：
- cacheLoader
- callable callbackjava

　　经过这两种方法建立的cache，和一般用map来缓存的作法比，不一样在于，这两种方法都实现了一种逻辑——从缓存中取key X的值，若是该值已经缓存过了，则返回缓存中的值，若是没有缓存过，能够经过某个方法来获取这个值。但不一样的在于cacheloader的定义比较宽泛，是针对整个cache定义的，能够认为是统一的根据key值load value的方法。而callable的方式较为灵活，容许你在get的时候指定。算法

CacheLoader方式实现实例：数据库

@Test
    public void TestLoadingCache() throws Exception{
        LoadingCache<String,String> cahceBuilder=CacheBuilder
        .newBuilder()
        .build(new CacheLoader<String, String>(){
            @Override
            public String load(String key) throws Exception {        
                String strProValue="hello "+key+"!";                
                return strProValue;
            }
            
        });        
        
        System.out.println("jerry value:"+cahceBuilder.apply("jerry"));
        System.out.println("jerry value:"+cahceBuilder.get("jerry"));
        System.out.println("peida value:"+cahceBuilder.get("peida"));
        System.out.println("peida value:"+cahceBuilder.apply("peida"));
        System.out.println("lisa value:"+cahceBuilder.apply("lisa"));
        cahceBuilder.put("harry", "ssdded");
        System.out.println("harry value:"+cahceBuilder.get("harry"));
    }

callable callback实现方式segmentfault

@Test
    public void testcallableCache()throws Exception{
        Cache<String, String> cache = CacheBuilder.newBuilder().maximumSize(1000).build();  
        String resultVal = cache.get("jerry", new Callable<String>() {  
            public String call() {  
                String strProValue="hello "+"jerry"+"!";                
                return strProValue;
            }  
        });  
        System.out.println("jerry value : " + resultVal);
        
        resultVal = cache.get("peida", new Callable<String>() {  
            public String call() {  
                String strProValue="hello "+"peida"+"!";                
                return strProValue;
            }  
        });  
        System.out.println("peida value : " + resultVal);  
    }

　　输出：
　　jerry value : hello jerry!
　　peida value : hello peida!

guava Cache数据移除：缓存

　　guava作cache时候数据的移除方式，在guava中数据的移除分为被动移除和主动移除两种。
　　被动移除数据的方式，guava默认提供了三种方式：
　　1.基于大小的移除:看字面意思就知道就是按照缓存的大小来移除，若是即将到达指定的大小，那就会把不经常使用的键值对从cache中移除。
　　定义的方式通常为 CacheBuilder.maximumSize(long)，还有一种一种能够算权重的方法，我的认为实际使用中不太用到。就这个经常使用的来看有几个注意点，
　　　　其一，这个size指的是cache中的条目数，不是内存大小或是其余；
　　　　其二，并非彻底到了指定的size系统才开始移除不经常使用的数据的，而是接近这个size的时候系统就会开始作移除的动做；
　　　　其三，若是一个键值对已经从缓存中被移除了，你再次请求访问的时候，若是cachebuild是使用cacheloader方式的，那依然仍是会从cacheloader中再取一次值，若是这样尚未，就会抛出异常
　　2.基于时间的移除：guava提供了两个基于时间移除的方法
　　expireAfterAccess(long, TimeUnit)　这个方法是根据某个键值对最后一次访问以后多少时间后移除
　　expireAfterWrite(long, TimeUnit) 这个方法是根据某个键值对被建立或值被替换后多少时间移除
　　3.基于引用的移除：
　　这种移除方式主要是基于java的垃圾回收机制，根据键或者值的引用关系决定移除
　　主动移除数据方式，主动移除有三种方法：
　　1.单独移除用 Cache.invalidate(key)
　　2.批量移除用 Cache.invalidateAll(keys)
　　3.移除全部用 Cache.invalidateAll()
　　若是须要在移除数据的时候有所动做还能够定义Removal Listener，可是有点须要注意的是默认Removal Listener中的行为是和移除动做同步执行的，若是须要改为异步形式，能够考虑使用RemovalListeners.asynchronous(RemovalListener, Executor)安全

2. 多级缓存

2.1 缓存策略

Multi-level caches introduce new design decisions. For instance, in some processors, all data in the L1 cache must also be somewhere in the L2 cache. These caches are called strictly inclusive. Other processors (like the AMD Athlon) have exclusive caches: data is guaranteed to be in at most one of the L1 and L2 caches, never in both. Still other processors (like the Intel Pentium II, III, and 4), do not require that data in the L1 cache also reside in the L2 cache, although it may often do so. There is no universally accepted name for this intermediate policy.架构

The advantage of exclusive caches is that they store more data. This advantage is larger when the exclusive L1 cache is comparable to the L2 cache, and diminishes if the L2 cache is many times larger than the L1 cache. When the L1 misses and the L2 hits on an access, the hitting cache line in the L2 is exchanged with a line in the L1. This exchange is quite a bit more work than just copying a line from L2 to L1, which is what an inclusive cache does.[33]并发

One advantage of strictly inclusive caches is that when external devices or other processors in a multiprocessor system wish to remove a cache line from the processor, they need only have the processor check the L2 cache. In cache hierarchies which do not enforce inclusion, the L1 cache must be checked as well. As a drawback, there is a correlation between the associativities of L1 and L2 caches: if the L2 cache does not have at least as many ways as all L1 caches together, the effective associativity of the L1 caches is restricted. Another disadvantage of inclusive cache is that whenever there is an eviction in L2 cache, the (possibly) corresponding lines in L1 also have to get evicted in order to maintain inclusiveness. This is quite a bit of work, and would result in a higher L1 miss rate.[33]app

Another advantage of inclusive caches is that the larger cache can use larger cache lines, which reduces the size of the secondary cache tags. (Exclusive caches require both caches to have the same size cache lines, so that cache lines can be swapped on a L1 miss, L2 hit.) If the secondary cache is an order of magnitude larger than the primary, and the cache data is an order of magnitude larger than the cache tags, this tag area saved can be comparable to the incremental area needed to store the L1 cache data in the L2.[34]

大致意思：
多级cache有三种设计：

exclusive：L1 cahce中的内容不能包含在L2中
strictly inclusive：L1cache的内容必定严格包含在L2中。
Third one（没有正式名字）:不要求L1的必定包含在L2中

优缺点
exclusive方式能够存储更多数据。固然若是L2大大超过L1的大小，则这个优点也并非很大了。exclusive要求若是L1 miss L2 hit，则须要把L2 hit的line和L1中的一条line交换。这就比inclusive直接从L2拷贝hit line到L1中的方式多些工做。

strictly inclusive 方式的一个优势是，当外部设备或者处理器想要从处理器里删掉一条cache line时，处理器只须要检查下L2 cache便可。而第一种和第三种方式中，则L1也须要被检查。而strictly inclusive一个缺点是L2中被替换的line，若是L1中有映射，也须要从L1中替换出去，这可能会致使L1的高miss率。

strictly inclusive 方式的另一个优势是，越大的cache可使用越大的cache line，这可能减少二级cache tags的大小。而Exclusive须要L1和L2的cache line大小相同，以便进行替换。若是二级cahce是远远大于一级cache，而且cache data部分远远大于tag，省下的tag部分能够存放数据。

3. 面临的问题

3.1 缓存穿透

咱们在项目中使用缓存一般都是先检查缓存中是否存在，若是存在直接返回缓存内容，若是不存在就直接查询数据库而后再缓存查询结果返回。这个时候若是咱们查询的某一个数据在缓存中一直不存在，就会形成每一次请求都查询DB，这样缓存就失去了意义，在流量大时，可能DB就挂掉了。

那这种问题有什么好办法解决呢？

要是有人利用不存在的key频繁攻击咱们的应用，这就是漏洞。有一个比较巧妙的做法是，能够将这个不存在的key预先设定一个值。好比，“key” , “&&”。

在返回这个&&值的时候，咱们的应用就能够认为这是不存在的key，那咱们的应用就能够决定是否继续等待继续访问，仍是放弃掉此次操做。若是继续等待访问，过一个时间轮询点后，再次请求这个key，若是取到的值再也不是&&，则能够认为这时候key有值了，从而避免了透传到数据库，从而把大量的相似请求挡在了缓存之中。

3.2 缓存并发

有时候若是网站并发访问高，一个缓存若是失效，可能出现多个进程同时查询DB，同时设置缓存的状况，若是并发确实很大，这也可能形成DB压力过大，还有缓存频繁更新的问题。
我如今的想法是对缓存查询加锁，若是KEY不存在，就加锁，而后查DB入缓存，而后解锁；其余进程若是发现有锁就等待，而后等解锁后返回数据或者进入DB查询。

这种状况和刚才说的预先设定值问题有些相似，只不过利用锁的方式，会形成部分请求等待。

3.3 缓存失效

引发这个问题的主要缘由仍是高并发的时候，平时咱们设定一个缓存的过时时间时，可能有一些会设置1分钟啊，5分钟这些，并发很高时可能会出在某一个时间同时生成了不少的缓存，而且过时时间都同样，这个时候就可能引起一当过时时间到后，这些缓存同时失效，请求所有转发到DB，DB可能会压力太重。

那如何解决这些问题呢？

其中的一个简单方案就时讲缓存失效时间分散开，好比咱们能够在原有的失效时间基础上增长一个随机值，好比1-5分钟随机，这样每个缓存的过时时间的重复率就会下降，就很难引起集体失效的事件。

3.4 缓存雪崩

场景:key缓存过时失效而新缓存未到期间,该key的查询全部请求都会去查询数据,形成DB压力上升,没必要要的DB开销

解决方案:
- 加锁排队重建,使请求能够串行化,而不用所有的请求都去查询数据库
- 假设key的过时时间是A,建立一个key_sign,它的过时时间比A小,查询key的时候检查key_sign是否已通过期,若是过时则加锁后台起一个线程异步去更新key的值,而实际的缓存没有过时(若是实际缓存已通过期,须要加锁排队重建),可是会浪费双份缓存
- 在原有的value中存一个过时值B,B比A小,取值的时候根据B判断value是否过时,若是过时,解决方案同上
- 牺牲用户体验,当发现缓存中没有对应的数据直接返回失败,而且把须要的数据放入一个分布式队列,后台经过异步线程更新队列中须要更新的缓存

3.5 缓存污染

场景: 一些非正常操做(导出excel,运营偶发性访问)而致使内存中出现不少冷数据
解决方案: 选取合适的缓存算法(LUR-N算法)

3.6 首次上线

场景: 缓存首次上线,若是网站的访问量很大,全部的请求都通过数据库(若是访问量比较少,能够由用户访问自行缓存)
解决方案: 缓存预热,在系统上线以前,全部的缓存都预先加载完毕(增长一个刷新缓存程序,上线后手动刷新或发布时自动调用刷用)

4. 缓存架构

http://jinnianshilongnian.iteye.com/blog/2283670

缓存