iOS底层探究 - 类的结构剖析（cache_t）

时间 2020-03-05

标签 ios 底层探究结构剖析 cache 栏目 iOS 繁體版

原文原文链接

引言：

上一篇咱们一块儿探索了 iOS 类的底层结构，咱们先回顾下他的定义：程序员

// 在objc-runtime-new.h这个文件发现了这段定义
struct objc_class : objc_object {
    // Class ISA;
    Class superclass;           // 8
    cache cache;             // formerly cache pointer and vtable 16
    class_data_bits_t bits;    // class_rw_t * plus custom rr/alloc flags 8
     //下面还有不少方法，在这里暂时咱们不关注
};
复制代码

咱们已经介绍了类的几个重要成员，其中重点探索了class_data_bits_t bits的内部结构，这里面还有一个cache_t，一块儿来看一看这个东西。顾名思义就是缓存的意思，那么用来缓存什么呢？
答案是： 缓存方法 。
它的底层是经过散列表（哈希表）的数据结构来实现存储和读取的，用于缓存曾经调用过的方法，再次调用时能够从缓存里面直接读取，提升方法的查找速度。那么接下来咱们详细介绍下这个家伙。算法

一：`cache_t`在源码中的定义

先看下类结构的定义：数组

咱们能够看出ISA,superclass分别都占8个字节，而cache_t是在class首地址平移16字节的位置，接下来咱们看下cache_t的定义：缓存

struct cache_t {
    struct bucket_t *_buckets; // 8字节,*便是指针,指针占 8 字节
    mask_t _mask;  // 4字节,uint32_t mask_t,int 类型 4 字节
    mask_t _occupied; // 4字节,同上
}
复制代码

其中：数据结构

_mask 散列表长度 - 1
_occupied 已缓存方法数量

而_buckets是一个数组,数组里面的每个元素就是一个bucket_t,咱们看下源码里bucket_t的定义：多线程

struct bucket_t {  
private:  
    // IMP-first is better for arm64e ptrauth and no worse for arm64. 
    // SEL-first is better for armv7* and i386 and x86_64. 
#if __arm64__ 
    MethodCacheIMP _imp;  
    cache_key_t _key;  
#else 
    cache_key_t _key;  
    MethodCacheIMP _imp;  
#endif 
public:  
    inline cache_key_t key() const { return _key; }  
    inline IMP imp() const { return (IMP)_imp; }  
    inline void setKey(cache_key_t newKey) { _key = newKey; }  
    inline void setImp(IMP newImp) { _imp = newImp; }  

    void set(cache_key_t newKey, IMP newImp);  
};  
复制代码

从源码能够可看出bucket_t里面包含了2个参数_imp和_key.less

_key 方法的SEL做为key
_imp 函数实现的内存地址

二：`cache_t`的做用

引言里面咱们提到cache_t是用来缓存方法的，那么为何要缓存方法呢，直接调用不能够吗？讲到这里咱们先回顾下方法的查找流程：
正常时候咱们调用方法是周NORMAL这种形式，也就是普通查找，假设有个person类的实例方法eat被调用[person eat],咱们来看下系统的查找流程:函数

obj -> isa -> obj的Class对象 -> method_array_t methods -> 对该表进行遍历查找，找到就调用，没找到继续往下走
obj的Class对象 -> superclass父类 -> method_array_t methods -> 对父类的方法列表进行遍历查找，找到就调用，没找到就重复本步骤
找到就调用，没找到重复流程 ...
直到跟类NSObject -> isa -> NSObject的Class对象 -> method_array_t methods
最后没找到才会走各类判断，抛出异常等

看下，多么复杂和繁琐，可是苹果的工程师就很聪明，在每一个类里面放一个缓存的盒子，你只要调用我就给你发方法的SEL和IMP保存下来，下次调用的时候只要根据SEL就能在缓存中很快的获得方法的实现地址，岂不是极大的提升了效率。ui

三：`cache_t`的缓存流程

关于流程源码里面有这样一段注释

* Cache readers (PC-checked by collecting_in_critical())
 * objc_msgSend*
 * cache_getImp
 *
 * Cache writers (hold cacheUpdateLock while reading or writing; not PC-checked)
 * cache_fill         (acquires lock)
 * cache_expand       (only called from cache_fill)
 * cache_create       (only called from cache_expand)
 * bcopy               (only called from instrumented cache_expand)
 * flush_caches        (acquires lock)
 * cache_flush        (only called from cache_fill and flush_caches)
 * cache_collect_free (only called from cache_expand and cache_flush)
复制代码

能够看出读缓存的时候过程很简单，就是调用objc_msgsend以后经过cache_getImp去读取函数的地址，因此咱们着重研究下写的流程，咱们看些的过程不少，可是他的入口是从cache_fill开始的：

void cache_fill(Class cls, SEL sel, IMP imp, id receiver) {
#if !DEBUG_TASK_THREADS
   mutex_locker_t lock(cacheUpdateLock);
   cache_fill_nolock(cls, sel, imp, receiver);
#else
   _collecting_in_critical();
   return;
#endif
}
复制代码

在cache_fill这个函数内部又调用了cache_fill_nolock这个函数：

static void cache_fill_nolock(Class cls, SEL sel, IMP imp, id receiver) {
    cacheUpdateLock.assertLocked();

    // Never cache before +initialize is done
    if (!cls->isInitialized()) return;

    // Make sure the entry wasn't added to the cache by some other thread 
    // before we grabbed the cacheUpdateLock.
    if (cache_getImp(cls, sel)) return;

    cache_t *cache = getCache(cls);
    cache_key_t key = getKey(sel);

    // Use the cache as-is if it is less than 3/4 full
    mask_t newOccupied = cache->occupied() + 1;
    mask_t capacity = cache->capacity();
    if (cache->isConstantEmptyCache()) {
        // Cache is read-only. Replace it.
        cache->reallocate(capacity, capacity ?: INIT_CACHE_SIZE);
    }
    else if (newOccupied <= capacity / 4 * 3) {
        // Cache is less than 3/4 full. Use it as-is.
    }
    else {
        // Cache is too full. Expand it.
        cache->expand();
    }

    // Scan for the first unused slot and insert there.
    // There is guaranteed to be an empty slot because the 
    // minimum size is 4 and we resized at 3/4 full.
    bucket_t *bucket = cache->find(key, receiver);
    if (bucket->key() == 0) cache->incrementOccupied();
    bucket->set(key, imp);
}
复制代码

这么大段代码，能够感受到这个是个核心函数，函数内部作了不少的操做，咱们逐行去研究下

首先是判断cls也就是类是否被初始化，若是没有直接return,接下来判断cache_getImp(cls, sel)是否有值，这里应该是防止在多线程的调用中，别的线程也会调用相同的方法，因此判断下是否在别的线程被写入，若是有就return

// Never cache before +initialize is done
    if (!cls->isInitialized()) return;

    // Make sure the entry wasn't added to the cache by some other thread 
    // before we grabbed the cacheUpdateLock.
    if (cache_getImp(cls, sel)) return;
复制代码

接下来是经过调用函数内部使用内存平移，拿出类内部的缓存，而后根据sel生成一个key

cache_t *cache = getCache(cls);
cache_key_t key = getKey(sel);
复制代码

首先定义newOccupied等于旧的占用数+1，取出cache_t中的capacity也就是缓存的容量值，

mask_t newOccupied = cache->occupied() + 1;
mask_t capacity = cache->capacity();
复制代码

接下来就是判断比较了：

1：若是缓存是是空的，则进行cache->reallocate()。
2：若是新的占位容量小于等于当前容量的3/4，则不做处理
3：而后若是新的占位容量大于当前容量的3/4，则进行扩容处理cache->expand()

if (cache->isConstantEmptyCache()) {
     // Cache is read-only. Replace it.
     cache->reallocate(capacity, capacity ?: INIT_CACHE_SIZE);
 }
 else if (newOccupied <= capacity / 4 * 3) {
     // Cache is less than 3/4 full. Use it as-is.
 }
 else {
     // Cache is too full. Expand it.
     cache->expand();
 }
复制代码

其中cache->reallocate(capacity, capacity ?: INIT_CACHE_SIZE)是对buckets从新生成，咱们看下他的实现：

void cache_t::reallocate(mask_t oldCapacity, mask_t newCapacity)
{
    bool freeOld = canBeFreed();

    bucket_t *oldBuckets = buckets();
    bucket_t *newBuckets = allocateBuckets(newCapacity);

    // Cache's old contents are not propagated. 
    // This is thought to save cache memory at the cost of extra cache fills.
    // fixme re-measure this

    assert(newCapacity > 0);
    assert((uintptr_t)(mask_t)(newCapacity-1) == newCapacity-1);

    setBucketsAndMask(newBuckets, newCapacity - 1);
    
// 下面这个就是把旧的bucket_t给抹掉，释放内存
    if (freeOld) {
        cache_collect_free(oldBuckets, oldCapacity);
        cache_collect(false);
    }
}
复制代码

函数是根据新的newCapacity生成一个新的Buckets而后把老的Buckets给替换掉，最后释放掉老的Bucket占用的内存空间。

接下来咱们看下cache->expand()这个函数的调用：

void cache_t::expand()
{
    cacheUpdateLock.assertLocked();
    
    uint32_t oldCapacity = capacity();
    uint32_t newCapacity = oldCapacity ? oldCapacity*2 : INIT_CACHE_SIZE;
  *
  能进入到扩容的这里面 _mask 是有值的，而且是而且咱们知道获得的oldCapacity是_maks + 1，
  申请的一份新的容量是 oldCapacity * 2，咱们能够验证一下开辟两倍的空间是最划算的。
  *

    if ((uint32_t)(mask_t)newCapacity != newCapacity) {
        // mask overflow - can't grow further
        // fixme this wastes one bit of mask
        newCapacity = oldCapacity;
     }
    reallocate(oldCapacity, newCapacity);
}
复制代码

以上咱们可总结出cache扩容，就是从新申请一个容量是原来2倍的新容量。

在这里咱们有一个疑问就是在容量不够的时候为何要销毁重建呢，那样以前的缓存不就没有了吗，为何保存以前缓存的方法呢？

苹果的程序员在设计这块的时候可能考虑到保存以前的调用cache，开辟空间以后还要把老的缓存进行内存平移，这样自己缓存是让人节省时间的设计，这样作反而更耗时，不如销毁直接重建来的快速。

扩容和销毁重建的函数咱们已经了解了，那么回到主线，此时Buckets存储筒已经准备好，接下来就是存储的过程，首先咱们经过cache->find(key, receiver)来寻找个合适的筒子，咱们看下他是怎么作寻找的：

bucket_t * cache_t::find(cache_key_t k, id receiver)
{
    assert(k != 0);

    bucket_t *b = buckets();
    mask_t m = mask();
// 经过cache_hash函数 [begin = k & m]计算出key的值 k 对应的index的值 begin，用来记录查询起始索引
    mask_t begin = cache_hash(k, m);
    
    // begin赋值给i，用于切换索引
    mask_t i = begin;
    do {
        if (b[i].key() == 0  ||  b[i].key() == k) {
            // 用这个i从散列表取值，若是取出来的bucket_t 的 key = k，则查询成功，返回bucket_t
            // 若是key = 0， 说明在索引i的位置上尚未缓存过方法，一样须要返回该bucket_t，用于终止缓存查询。
            return &b[i];
        }
    } while ((i = cache_next(i, m)) != begin);
// 这里其实就是找到咱们cache_t中buckets列表里面须要匹配的bucket。
    // hack
    // 若是此时尚未找到key对应的bucket_t，或者是空的bucket_t，则循环结束，说明查找失败，调用下面的bad_cache函数
    Class cls = (Class)((uintptr_t)this - offsetof(objc_class, cache));
    cache_t::bad_cache(receiver, (SEL)k, cls);
}
复制代码

咱们知道Buckets实际上是一个数组，数组的底层也是个散列表，根据key计算出index值的这个算法称做散列算法。index = @selector(XXXX) & mask 根据&运算的特色，能够得知最终index <= mask，而mask = 散列表长度 - 1，也就是说0 <= index <= 散列表长度 - 1，这实际上覆盖了散列表的索引范围。

这个函数调用以后咱们获取到了合适的bucket筒子，接下来判断if (bucket->key() == 0) cache->incrementOccupied()若是为真也就是筒子没被占用过，那么Occupied占用数要加一。

最后，调用set(key, imp)进行填充

bucket->set(key, imp);
复制代码

咱们总结下cache_t的整体流程：

1: 当一个对象经过objc_megsend接收到消息时;首先根据obj的isa指针进入它的类对象cls里面。
2: 在obj的cls里面，首先到缓存cache_t里面查询方法message的函数实现，若是找到，就直接调用该函数。
3: 若是上一步没有找到对应函数，在对该cls的方法列表进行二分/遍历查找
4: 若是找到了对应函数，接下来就是对cache_t进行填充

(1) 进行容错判断，准备一些临时变量。
(2) 在每次进行缓存操做以前，首先须要检查缓存容量，若是缓存内的方法数量超过规定的临界值(设定容量的3/4)，须要先对缓存进行2倍扩容，原先缓存过的方法所有丢弃，而后将当前方法存入扩容后的新缓存内
(3) 在Buckets数组里经过散列算法进行查找合适的bucket
(4) 找到以后判断是否曾经占用过，若是没有占用过，那么就把Occupied加一
(5) 将方法缓存到bucket中

5:调用该方法。

本片类的结构剖析（cache_t）分析完毕。

iOS底层探究 - 类的结构剖析（cache_t）

目录：

引言：

一：cache_t在源码中的定义

二：cache_t的做用

三：cache_t的缓存流程

接下来就是判断比较了：

一：`cache_t`在源码中的定义

二：`cache_t`的做用

三：`cache_t`的缓存流程