iOS的OC的方法缓存的源码分析

时间 2020-06-06

标签 ios 方法缓存源码分析栏目 iOS 繁體版

原文原文链接

前言

笔者整理了一系列有关OC的底层文章，但愿能够帮助到你。这篇文章主要讲解的是方法缓存的底层源码分析。数组

1.iOS的OC对象建立的alloc原理缓存

2.iOS的OC对象的内存对齐bash

3.iOS的OC的isa的底层原理less

4.iOS的OC源码分析之类的结构分析函数

在平常开发中，咱们调用方法的时候有没有想过一个问题，在咱们频繁地调用方法，为了高效苹果会不会对使用过的方法作缓存起来？若是有作缓存的话，具体是怎样作的呢？为了了解这块的内容，本篇文章就对cache_t作源码分析。源码分析

1.cache_t

在上一篇文章iOS的OC源码分析之类的结构分析中知道cache_t是在objc_class结构体中，占16个字节，cache_t的源码以下：post

struct cache_t {
    struct bucket_t *_buckets;
    mask_t _mask;
    mask_t _occupied;
    ...
};

struct bucket_t {
private:
    // IMP-first is better for arm64e ptrauth and no worse for arm64.
    // SEL-first is better for armv7* and i386 and x86_64.
#if __arm64__
    MethodCacheIMP _imp;
    cache_key_t _key;
#else
    cache_key_t _key;
    MethodCacheIMP _imp;
#endif
}

using MethodListIMP = IMP;
typedef uintptr_t cache_key_t;
复制代码

从源码能够知道，经过将方法编号SEL和函数地址IMP缓存在bucket_t（又称哈希桶）中。为了方便接下来的内容，定义了一个TestObject的类，具体的代码以下：ui

#import <Foundation/Foundation.h>

NS_ASSUME_NONNULL_BEGIN

@interface TestObject : NSObject{
    NSString *nickName;
}

@property(nonatomic,copy) NSString *name;

-(void)sayName;
-(void)sayHello;
-(void)sayTest;
+(void)sayNickName;

@end

NS_ASSUME_NONNULL_END

#import "TestObject.h"

@implementation TestObject

-(void)sayName{
    NSLog(@"%p",__func__);
}

-(void)sayHello{
    NSLog(@"%p",__func__);
}

-(void)sayTest{
    NSLog(@"%p",__func__);
}

+(void)sayNickName{
    NSLog(@"%p",__func__);
}

@end

//实现的代码
TestObject *testObject = [TestObject alloc];
Class tClass = object_getClass(testObject);
[testObject sayName];
[testObject sayHello];
NSLog(@"%@",testObject);

复制代码

由于实例对象里面的方法是在类里面调用的，为了验证明例方法是否是存在cache_t里面，咱们能够经过lldb的指令来找到cache_t而后深刻进去查看，以下图所示this

由上图能够知道，由于咱们在调用了 TestObject类的三个方法(包括了init方法)，图中的 _mask和 _occupied都为3。那么咱们再调用多一个方法，以下所示

TestObject *testObject = [[TestObject alloc] init];
Class tClass = object_getClass(testObject);
[testObject sayName];
[testObject sayHello];
[testObject sayTest];
NSLog(@"%@",testObject);
复制代码

再次使用 lldb的指令来查看，能够知道此时的 _mask为7，可是 _occupied为1，而且此时的 buckets的数组里面只有一个 sayTest方法，仍是不是有序存放，此时其余的方法不存在了。因此由此能够知道，方法的缓存并非有一个存一个的，里面是有对方法的缓存作必定的处理的。

1.1 cache_t的属性值

_buckets：是bucket_t结构体的数组，bucket_t是用来存放方法的SEL内存地址和IMP的。
_mask：是数组容量的大小用做掩码。（由于这里维护的数组大小都是2的整数次幂，因此_mask的二进制位000011,000111,001111）恰好能够用做hash取余数的掩码。恰好保证相与后不超过缓存大小。
_occupied：是当前已缓存的方法数，即数组中已使用了多少位置。

2.方法缓存的原理分析

OC方法的本质是消息发送（即objc_msgSend），底层是经过方法的 SEL 查找 IMP。读取cache_t缓存是经过objc_msgSend的查找，cache_t缓存的写首先是经过cache_fill函数，以下源码：atom

* Cache readers (PC-checked by collecting_in_critical())
 * objc_msgSend*
 * cache_getImp
 *
 * Cache writers (hold cacheUpdateLock while reading or writing; not PC-checked)
 * cache_fill         (acquires lock)
 * cache_expand       (only called from cache_fill)
 * cache_create       (only called from cache_expand)
 * bcopy               (only called from instrumented cache_expand)
 * flush_caches        (acquires lock)
 * cache_flush        (only called from cache_fill and flush_caches)
 * cache_collect_free (only called from cache_expand and cache_flush)
复制代码

2.1 cache_fill

方法的缓存首先是经过cache_fill函数，源码以下

void cache_fill(Class cls, SEL sel, IMP imp, id receiver)
{
#if !DEBUG_TASK_THREADS
    mutex_locker_t lock(cacheUpdateLock);
    cache_fill_nolock(cls, sel, imp, receiver);
#else
    _collecting_in_critical();
    return;
#endif
}
复制代码

cache_fill方法传入cls类的Class和方法的SEL,IMP。

2.2 cache_fill_nolock

static void cache_fill_nolock(Class cls, SEL sel, IMP imp, id receiver)
{
    cacheUpdateLock.assertLocked();

    // Never cache before +initialize is done
    if (!cls->isInitialized()) return;

    // Make sure the entry wasn't added to the cache by some other thread // before we grabbed the cacheUpdateLock. if (cache_getImp(cls, sel)) return; cache_t *cache = getCache(cls); cache_key_t key = getKey(sel); // Use the cache as-is if it is less than 3/4 full mask_t newOccupied = cache->occupied() + 1; mask_t capacity = cache->capacity(); if (cache->isConstantEmptyCache()) { // Cache is read-only. Replace it. cache->reallocate(capacity, capacity ?: INIT_CACHE_SIZE); } else if (newOccupied <= capacity / 4 * 3) { // Cache is less than 3/4 full. Use it as-is. } else { // Cache is too full. Expand it. cache->expand(); } // Scan for the first unused slot and insert there. // There is guaranteed to be an empty slot because the // minimum size is 4 and we resized at 3/4 full. bucket_t *bucket = cache->find(key, receiver); if (bucket->key() == 0) cache->incrementOccupied(); bucket->set(key, imp); } cache_t *getCache(Class cls) { assert(cls); return &cls->cache; } cache_key_t getKey(SEL sel) { assert(sel); return (cache_key_t)sel; } /* Initial cache bucket count. INIT_CACHE_SIZE must be a power of two. */ enum { INIT_CACHE_SIZE_LOG2 = 2, INIT_CACHE_SIZE = (1 << INIT_CACHE_SIZE_LOG2) }; #if __LP64__ typedef uint32_t mask_t; // x86_64 & arm64 asm are less efficient with 16-bits #else typedef uint16_t mask_t; #endif typedef uintptr_t cache_key_t; 复制代码

从源码中各个方法来分析一下，其中的getCache(cls)经过cls来获取到类的cache_t。getKey(sel)将SEL转化为cache_key_t的类型。下面是 cache->occupied()和cache->capacity()的源码。

mask_t cache_t::occupied() 
{
    return _occupied;
}

mask_t cache_t::capacity() 
{
    return mask() ? mask()+1 : 0; 
}

mask_t cache_t::mask() 
{
    return _mask; 
}

复制代码

_occupied是方法的数量，默认是0，因此一开始进来的话newOccupied的值是1至关于占用1个缓存的数量来作缓存，而capacity()是获取缓存的方法数量，默认也是0的，若是mask()有值了就是在这个基础上加1，这就至关于获取方法的容量。接下来就是三个的条件判断了，第一个判断isConstantEmptyCache()是判断是否有缓存，第二个判断是判断占用的方法数量是否小于等于容量的3/4，若是是就什么都不作。不然就须要开始扩容expand。若是没有缓存的话就须要执行reallocate函数。其中reallocate中的INIT_CACHE_SIZE是4,因此一开始传进去的reallocate的值是0和4.

2.2.1 reallocate

从函数名的大概能够看出意思，就是从新初始化缓存的意思。这个函数的源码以下：

void cache_t::reallocate(mask_t oldCapacity, mask_t newCapacity)
{
    //判断是否能够释放旧的缓存的标示
    bool freeOld = canBeFreed();
    //获取旧的buckets
    bucket_t *oldBuckets = buckets();
    //建立新的buckets
    bucket_t *newBuckets = allocateBuckets(newCapacity);

    // Cache's old contents are not propagated. // This is thought to save cache memory at the cost of extra cache fills. // fixme re-measure this assert(newCapacity > 0); assert((uintptr_t)(mask_t)(newCapacity-1) == newCapacity-1); //设置新的buckets和赋值mask setBucketsAndMask(newBuckets, newCapacity - 1); if (freeOld) { //释放旧的buckets cache_collect_free(oldBuckets, oldCapacity); cache_collect(false); } } bool cache_t::canBeFreed() { return !isConstantEmptyCache(); } bucket_t *allocateBuckets(mask_t newCapacity) { // Allocate one extra bucket to mark the end of the list. // This can't overflow mask_t because newCapacity is a power of 2.
    // fixme instead put the end mark inline when +1 is malloc-inefficient
    bucket_t *newBuckets = (bucket_t *)
        calloc(cache_t::bytesForCapacity(newCapacity), 1);

    bucket_t *end = cache_t::endMarker(newBuckets, newCapacity);

#if __arm__
    // End marker's key is 1 and imp points BEFORE the first bucket. // This saves an instruction in objc_msgSend. end->setKey((cache_key_t)(uintptr_t)1); end->setImp((IMP)(newBuckets - 1)); #else // End marker's key is 1 and imp points to the first bucket.
    end->setKey((cache_key_t)(uintptr_t)1);
    end->setImp((IMP)newBuckets);
#endif
    
    if (PrintCaches) recordNewCache(newCapacity);

    return newBuckets;
}


void cache_t::setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask)
{
    // objc_msgSend uses mask and buckets with no locks.
    // It is safe for objc_msgSend to see new buckets but old mask.
    // (It will get a cache miss but not overrun the buckets' bounds). // It is unsafe for objc_msgSend to see old buckets and new mask. // Therefore we write new buckets, wait a lot, then write new mask. // objc_msgSend reads mask first, then buckets. // ensure other threads see buckets contents before buckets pointer mega_barrier(); _buckets = newBuckets; // ensure other threads see new buckets before new mask mega_barrier(); _mask = newMask; _occupied = 0; } 复制代码

从源码中能够看到reallocate获取旧的buckets和建立新的buckets，由于旧的buckets在判断能够释放的时候是须要抹掉的。建立新的buckets在allocateBuckets函数能够知道，经过calloc函数来申请cache_t类型的内存空间，而且对key和imp都设置了默认值。在setBucketsAndMask函数中对buckets和_mask赋值,由于一开始传进来的newMask为3，_occupied为0之因此为0是由于此时尚未对方法作缓存只是初始化值。这就很好地说明了上面第一次用lldb指令的时候获得的mask为3.

2.2.2 expand

在newOccupied的值大于capacity的3/4，这时候就须要扩容，这时候就须要执行expand()函数

void cache_t::expand()
{
    cacheUpdateLock.assertLocked();
    
    uint32_t oldCapacity = capacity();
    uint32_t newCapacity = oldCapacity ? oldCapacity*2 : INIT_CACHE_SIZE;

    if ((uint32_t)(mask_t)newCapacity != newCapacity) {
        // mask overflow - can't grow further // fixme this wastes one bit of mask newCapacity = oldCapacity; } reallocate(oldCapacity, newCapacity); } mask_t cache_t::capacity() { return mask() ? mask()+1 : 0; } 复制代码

在须要扩容的时候，此时的capacity()值为4了，因此oldCapacity为4，newCapacity为8，而后会继续执行reallocate函数，传进去的参数分别为4和8。根据上面的reallocate函数的执行流程会将旧的buckets清空，修改mask的值为7，而后occupied的值为0.可是为何会在lldb的指令的时候看到的occupied为1呢？在这个流程走完以后，执行完判断的流程以后，会执行到

// Scan for the first unused slot and insert there.
    // There is guaranteed to be an empty slot because the 
    // minimum size is 4 and we resized at 3/4 full.
    bucket_t *bucket = cache->find(key, receiver);
    if (bucket->key() == 0) cache->incrementOccupied();
    bucket->set(key, imp);
    
    void cache_t::incrementOccupied() 
{
    _occupied++;
}

复制代码

其中find函数经过上面的key和receiver来查找bucket_t。若是key()为0的时候，这时会对_occupied数量+1。而且对bucket的key和imp进行填充。

2.2.3 find函数

bucket_t * cache_t::find(cache_key_t k, id receiver)
{
    assert(k != 0);

    bucket_t *b = buckets();
    mask_t m = mask();
    // 经过cache_hash函数【begin  = k & m】计算出key值 k 对应的 index值 begin，用来记录查询起始索引
    mask_t begin = cache_hash(k, m);
    // begin 赋值给 i，用于切换索引
    mask_t i = begin;
    do {
        if (b[i].key() == 0  ||  b[i].key() == k) {
            //用这个i从散列表取值，若是取出来的bucket_t的 key = k，则查询成功，返回该bucket_t，
            //若是key = 0，说明在索引i的位置上尚未缓存过方法，一样须要返回该bucket_t，用于停止缓存查询。
            return &b[i];
        }
    } while ((i = cache_next(i, m)) != begin);
    
    // 这一步其实至关于 i = i-1,回到上面do循环里面，至关于查找散列表上一个单元格里面的元素，再次进行key值 k的比较，
    //当i=0时，也就i指向散列表最首个元素索引的时候从新将mask赋值给i，使其指向散列表最后一个元素，从新开始反向遍历散列表，
    //其实就至关于绕圈，把散列表头尾连起来，不就是一个圈嘛，从begin值开始，递减索引值，当走过一圈以后，必然会从新回到begin值，
    //若是此时尚未找到key对应的bucket_t，或者是空的bucket_t，则循环结束，说明查找失败，调用bad_cache方法。
 
    // hack
    Class cls = (Class)((uintptr_t)this - offsetof(objc_class, cache));
    cache_t::bad_cache(receiver, (SEL)k, cls);
}

static inline mask_t cache_hash(cache_key_t key, mask_t mask) 
{
    return (mask_t)(key & mask);
}
复制代码

从find函数能够知道，经过mask的大小与获取的key用hash函数的形式获得begin下标来获得bucket_t的地址进行返回，由于hash函数是无序的，因此在buckets里面存放的位置也是无序的。

在类的cache_t中是找不到类方法的，由于类方法都是缓存在元类中，因此若是想经过lldb指令来查找类方法，能够先经过isa找到元类，能够根据上面的流程来验证元类中是否是存放类方法。

3.最后

OC方法的本质是消息发送（即objc_msgSend），底层是经过方法的 SEL 查找 IMP。

1.方法缓存在cache_t中，分别用buckets指针地址来存方法数组，mask来存放方法数组的容量大小，occupied来存放当前的方法占用数量。
2.在方法的newOccupied新的方法占用数量大于当前的方法数量capacity()的3/4就须要扩容。
3.在扩容的过程当中，会设置mask为capacity() * 2 - 1即方法的数量的2倍减1，例如第一次为3，第二次为7。最后都会将旧的buckets列表清空。可是最后都会将执行到须要扩容的方法加入到buckets里面。