iOS底层原理总结 - 探寻Runtime本质（二）

时间 2019-11-30

标签 ios 底层原理总结探寻 runtime 本质栏目 iOS 繁體版

原文原文链接

Class的结构

经过上一章中对isa本质结构有了新的认识，今天来回顾Class的结构，从新认识Class内部结构。数组

首先来看一下Class的内部结构代码，对探寻Class的本质作简单回顾。缓存

struct objc_class : objc_object {
    // Class ISA;
    Class superclass;
    cache_t cache;             // formerly cache pointer and vtable
    class_data_bits_t bits;    // class_rw_t * plus custom rr/alloc flags

    class_rw_t *data() { 
        return bits.data();
    }
    void setData(class_rw_t *newData) {
        bits.setData(newData);
    }
}
复制代码

class_rw_t* data() {
    return (class_rw_t *)(bits & FAST_DATA_MASK);
}
复制代码

class_rw_t

上述源码中咱们知道bits & FAST_DATA_MASK位运算以后，能够获得class_rw_t，而class_rw_t中存储着方法列表、属性列表以及协议列表，来看一下class_rw_t部分代码安全

struct class_rw_t {
    // Be warned that Symbolication knows the layout of this structure.
    uint32_t flags;
    uint32_t version;

    const class_ro_t *ro;

    method_array_t methods; // 方法列表
    property_array_t properties; // 属性列表
    protocol_array_t protocols; // 协议列表

    Class firstSubclass;
    Class nextSiblingClass;

    char *demangledName;
};
复制代码

上述源码中，method_array_t、property_array_t、protocol_array_t其实都是二维数组，来到method_array_t、property_array_t、protocol_array_t内部看一下。这里以method_array_t为例，method_array_t自己就是一个数组，数组里面存放的是数组method_list_t，method_list_t里面最终存放的是method_tbash

class method_array_t : 
    public list_array_tt<method_t, method_list_t> 
{
    typedef list_array_tt<method_t, method_list_t> Super;

 public:
    method_list_t **beginCategoryMethodLists() {
        return beginLists();
    }
    
    method_list_t **endCategoryMethodLists(Class cls);

    method_array_t duplicate() {
        return Super::duplicate<method_array_t>();
    }
};


class property_array_t : 
    public list_array_tt<property_t, property_list_t> 
{
    typedef list_array_tt<property_t, property_list_t> Super;

 public:
    property_array_t duplicate() {
        return Super::duplicate<property_array_t>();
    }
};


class protocol_array_t : 
    public list_array_tt<protocol_ref_t, protocol_list_t> 
{
    typedef list_array_tt<protocol_ref_t, protocol_list_t> Super;

 public:
    protocol_array_t duplicate() {
        return Super::duplicate<protocol_array_t>();
    }
};
复制代码

class_rw_t里面的methods、properties、protocols是二维数组，是可读可写的，其中包含了类的初始内容以及分类的内容。数据结构

这里以method_array_t为例，图示其中的结构。app

class_ro_t

咱们以前提到过class_ro_t中也有存储方法、属性、协议列表，另外还有成员变量列表。函数

接着来看一下class_ro_t部分代码post

struct class_ro_t {
    uint32_t flags;
    uint32_t instanceStart;
    uint32_t instanceSize;
#ifdef __LP64__
    uint32_t reserved;
#endif

    const uint8_t * ivarLayout;
    
    const char * name;
    method_list_t * baseMethodList;
    protocol_list_t * baseProtocols;
    const ivar_list_t * ivars;

    const uint8_t * weakIvarLayout;
    property_list_t *baseProperties;

    method_list_t *baseMethods() const {
        return baseMethodList;
    }
};
复制代码

上述源码中能够看到class_ro_t *ro是只读的，内部直接存储的直接就是method_list_t、protocol_list_t 、property_list_t类型的一维数组，数组里面分别存放的是类的初始信息，以method_list_t为例，method_list_t中直接存放的就是method_t，可是是只读的，不容许增长删除修改。性能

总结

以方法列表为例，class_rw_t中的methods是二维数组的结构，而且可读可写，所以能够动态的添加方法，而且更加便于分类方法的添加。由于咱们在Category的本质里面提到过，attachList函数内经过memmove 和 memcpy两个操做将分类的方法列表合并在本类的方法列表中。那么此时就将分类的方法和本类的方法统一整合到一块儿了。学习

其实一开始类的方法，属性，成员变量属性协议等等都是存放在class_ro_t中的，当程序运行的时候，须要将分类中的列表跟类初始的列表合并在一块儿的时，就会将class_ro_t中的列表和分类中的列表合并起来存放在class_rw_t中，也就是说class_rw_t中有部分列表是从class_ro_t里面拿出来的。而且最终和分类的方法合并。能够经过源码提现这里一点。

realizeClass部分源码

static Class realizeClass(Class cls)
{
    runtimeLock.assertWriting();

    const class_ro_t *ro;
    class_rw_t *rw;
    Class supercls;
    Class metacls;
    bool isMeta;

    if (!cls) return nil;
    if (cls->isRealized()) return cls;
    assert(cls == remapClass(cls));

    // 最开始cls->data是指向ro的
    ro = (const class_ro_t *)cls->data();

    if (ro->flags & RO_FUTURE) { 
        // rw已经初始化而且分配内存空间
        rw = cls->data();  // cls->data指向rw
        ro = cls->data()->ro;  // cls->data()->ro指向ro
        cls->changeInfo(RW_REALIZED|RW_REALIZING, RW_FUTURE);
    } else { 
        // 若是rw并不存在，则为rw分配空间
        rw = (class_rw_t *)calloc(sizeof(class_rw_t), 1); // 分配空间
        rw->ro = ro;  // rw->ro从新指向ro
        rw->flags = RW_REALIZED|RW_REALIZING;
        // 将rw传入setData函数，等于cls->data()从新指向rw
        cls->setData(rw); 
    }
}
复制代码

那么从上述源码中就能够发现，类的初始信息原本实际上是存储在class_ro_t中的，而且ro原本是指向cls->data()的，也就是说bits.data()获得的是ro，可是在运行过程当中建立了class_rw_t，并将cls->data指向rw，同时将初始信息ro赋值给rw中的ro。最后在经过setData(rw)设置data。那么此时bits.data()获得的就是rw，以后再去检查是否有分类，同时将分类的方法，属性，协议列表整合存储在class_rw_t的方法，属性及协议列表中。

经过上述对源码的分析，咱们对class_rw_t内存储方法、属性、协议列表的过程有了更清晰的认识，那么接下来探寻class_rw_t中是如何存储方法的。

class_rw_t中是如何存储方法的

method_t

咱们知道method_array_t、property_array_t、protocol_array_t中以method_array_t为例，method_array_t中最终存储的是method_t，method_t是对方法、函数的封装，每个方法对象就是一个method_t。经过源码看一下method_t的结构体

struct method_t {
    SEL name;  // 函数名
    const char *types;  // 编码（返回值类型，参数类型）
    IMP imp; // 指向函数的指针（函数地址）
};
复制代码

method_t结构体中能够看到三个成员变量，咱们依次来看三个成员变量分别表明什么。

SEL

SEL表明方法\函数名，通常叫作选择器，底层结构跟char *相似 typedef struct objc_selector *SEL;，能够把SEL看作是方法名字符串。

SEL能够经过@selector()和sel_registerName()得到

SEL sel1 = @selector(test);
SEL sel2 = sel_registerName("test");
复制代码

也能够经过sel_getName()和NSStringFromSelector()将SEL转成字符串

char *string = sel_getName(sel1);
NSString *string2 = NSStringFromSelector(sel2);
复制代码

不一样类中相同名字的方法，所对应的方法选择器是相同的。

NSLog(@"%p,%p", sel1,sel2);
Runtime-test[23738:8888825] 0x1017718a3,0x1017718a3
复制代码

SEL仅仅表明方法的名字，而且不一样类中相同的方法名的SEL是全局惟一的。

types

types包含了函数返回值，参数编码的字符串。经过字符串拼接的方式将返回值和参数拼接成一个字符串，来表明函数返回值及参数。

咱们经过代码查看一下types是如何表明函数返回值及参数的，首先经过本身模拟Class的内部实现，经过强制转化来探寻内部数据，相关代码在探寻Class的本质中提到过，这里不在赘述。

Person *person = [[Person alloc] init];
xx_objc_class *cls = (__bridge xx_objc_class *)[Person class];
class_rw_t *data = cls->data();
复制代码

经过断点能够在data中找到types的值

上图中能够看出types的值为v16@0:8，那么这个值表明什么呢？apple为了可以清晰的使用字符串表示方法及其返回值，制定了一系列对应规则，经过下表能够看到一一对应关系

将types的值同表中的一一对照查看types的值v16@0:8 表明什么

- (void) test;

 v    16      @     0     :     8
void         id          SEL
// 16表示参数的占用空间大小，id后面跟的0表示从0位开始存储，id占8位空间。
// SEL后面的8表示从第8位开始存储，SEL一样占8位空间
复制代码

咱们知道任何方法都默认有两个参数的，id类型的self，和SEL类型的_cmd，而上述经过对types的分析同时也验证了这个说法。

为了可以看的更加清晰，咱们为test添加返回值及参数以后从新查看types的值。

一样经过上表找出一一对应的值，查看types的值表明的方法

- (int)testWithAge:(int)age Height:(float)height
{
    return 0;
}
  i    24    @    0    :    8    i    16    f    20
int         id        SEL       int        float
// 参数的总占用空间为 8 + 8 + 4 + 4 = 24
// id 从第0位开始占据8位空间
// SEL 从第8位开始占据8位空间
// int 从第16位开始占据4位空间
// float 从第20位开始占据4位空间
复制代码

iOS提供了@encode的指令，能够将具体的类型转化成字符串编码。

NSLog(@"%s",@encode(int));
NSLog(@"%s",@encode(float));
NSLog(@"%s",@encode(id));
NSLog(@"%s",@encode(SEL));

// 打印内容
Runtime-test[25275:9144176] i
Runtime-test[25275:9144176] f
Runtime-test[25275:9144176] @
Runtime-test[25275:9144176] :
复制代码

上述代码中能够看到，对应关系确实如上表所示。

IMP

IMP表明函数的具体实现，存储的内容是函数地址。也就是说当找到imp的时候就能够找到函数实现，进而对函数进行调用。

在上述代码中打印IMP的值

Printing description of data->methods->first.imp:
(IMP) imp = 0x000000010c66a4a0 (Runtime-test`-[Person testWithAge:Height:] at Person.m:13)
复制代码

以后在test方法内部打印断点，并来到其方法内部能够看出imp中的存储的地址也就是方法实现的地址。

经过上面的学习咱们知道了方法列表是如何存储在Class类对象中的，可是当屡次继承的子类想要调用基类方法时，就须要经过superclass指针一层一层找到基类，在从基类方法列表中找到对应的方法进行调用。若是屡次调用基类方法，那么就须要屡次遍历每一层父类的方法列表，这对性能来讲无疑是伤害巨大的。

apple经过方法缓存的形式解决了这一问题，接下来咱们来探寻Class类对象是如何进行方法缓存的

方法缓存 cache_t

回到类对象结构体，成员变量cache就是用来对方法进行缓存的。

struct objc_class : objc_object {
    // Class ISA;
    Class superclass;
    cache_t cache;             // formerly cache pointer and vtable
    class_data_bits_t bits;    // class_rw_t * plus custom rr/alloc flags

    class_rw_t *data() { 
        return bits.data();
    }
    void setData(class_rw_t *newData) {
        bits.setData(newData);
    }
}
复制代码

cache_t cache;用来缓存曾经调用过的方法，能够提升方法的查找速度。

回顾方法调用过程：调用方法的时候，须要去方法列表里面进行遍历查找。若是方法不在列表里面，就会经过superclass找到父类的类对象，在去父类类对象方法列表里面遍历查找。

若是方法须要调用不少次的话，那就至关于每次调用都须要去遍历屡次方法列表，为了可以快速查找方法，apple设计了cache_t来进行方法缓存。

每当调用方法的时候，会先去cache中查找是否有缓存的方法，若是没有缓存，在去类对象方法列表中查找，以此类推直到找到方法以后，就会将方法直接存储在cache中，下一次在调用这个方法的时候，就会在类对象的cache里面找到这个方法，直接调用了。

cache_t 如何进行缓存

那么cache_t是如何对方法进行缓存的呢？首先来看一下cache_t的内部结构。

struct cache_t {
    struct bucket_t *_buckets; // 散列表 数组
    mask_t _mask; // 散列表的长度 -1
    mask_t _occupied; // 已经缓存的方法数量
};
复制代码

bucket_t是以数组的方式存储方法列表的，看一下bucket_t内部结构

struct bucket_t {
private:
    cache_key_t _key; // SEL做为Key
    IMP _imp; // 函数的内存地址
};
复制代码

从源码中能够看出bucket_t中存储着SEL和_imp，经过key->value的形式，以SEL为key，函数实现的内存地址 _imp为value来存储方法。

经过一张图来展现一下cache_t的结构。

上述bucket_t列表咱们称之为散列表（哈希表）散列表（Hash table，也叫哈希表），是根据关键码值(Key value)而直接进行访问的数据结构。也就是说，它经过把关键码值映射到表中一个位置来访问记录，以加快查找的速度。这个映射函数叫作散列函数，存放记录的数组叫作散列表。

那么apple如何在散列表中快速而且准确的找到对应的key以及函数实现呢？这就须要咱们经过源码来看一下apple的散列函数是如何设计的。

散列函数及散列表原理

首先来看一下存储的源码，主要查看几个函数，关键代码都有注释，不在赘述。

cache_fill 及 cache_fill_nolock 函数

void cache_fill(Class cls, SEL sel, IMP imp, id receiver)
{
#if !DEBUG_TASK_THREADS
    mutex_locker_t lock(cacheUpdateLock);
    cache_fill_nolock(cls, sel, imp, receiver);
#else
    _collecting_in_critical();
    return;
#endif
}

static void cache_fill_nolock(Class cls, SEL sel, IMP imp, id receiver)
{
    cacheUpdateLock.assertLocked();
    // 若是没有initialize直接return
    if (!cls->isInitialized()) return;
    // 确保线程安全，没有其余线程添加缓存
    if (cache_getImp(cls, sel)) return;
    // 经过类对象获取到cache 
    cache_t *cache = getCache(cls);
    // 将SEL包装成Key
    cache_key_t key = getKey(sel);
   // 占用空间+1
    mask_t newOccupied = cache->occupied() + 1;
   // 获取缓存列表的缓存能力，能存储多少个键值对
    mask_t capacity = cache->capacity();
    if (cache->isConstantEmptyCache()) {
        // 若是为空的，则建立空间，这里建立的空间为4个。
        cache->reallocate(capacity, capacity ?: INIT_CACHE_SIZE);
    }
    else if (newOccupied <= capacity / 4 * 3) {
        // 若是所占用的空间占总数的3/4一下，则继续使用如今的空间
    }
    else {
       // 若是占用空间超过3/4则扩展空间
        cache->expand();
    }
    // 经过key查找合适的存储空间。
    bucket_t *bucket = cache->find(key, receiver);
    // 若是key==0则说明以前未存储过这个key，占用空间+1
    if (bucket->key() == 0) cache->incrementOccupied();
    // 存储key，imp 
    bucket->set(key, imp);
}
复制代码

reallocate 函数

经过上述源码看到reallocate函数负责分配散列表空间，来到reallocate函数内部。

void cache_t::reallocate(mask_t oldCapacity, mask_t newCapacity)
{
    // 旧的散列表可否被释放
    bool freeOld = canBeFreed();
    // 获取旧的散列表
    bucket_t *oldBuckets = buckets();
    // 经过新的空间需求量建立新的散列表
    bucket_t *newBuckets = allocateBuckets(newCapacity);

    assert(newCapacity > 0);
    assert((uintptr_t)(mask_t)(newCapacity-1) == newCapacity-1);
    // 设置Buckets和Mash，Mask的值为散列表长度-1
    setBucketsAndMask(newBuckets, newCapacity - 1);
    // 释放旧的散列表
    if (freeOld) {
        cache_collect_free(oldBuckets, oldCapacity);
        cache_collect(false);
    }
}
复制代码

上述源码中首次传入reallocate函数的newCapacity为INIT_CACHE_SIZE，INIT_CACHE_SIZE是个枚举值，也就是4。所以散列表最初建立的空间就是4个。

enum {
    INIT_CACHE_SIZE_LOG2 = 2,
    INIT_CACHE_SIZE      = (1 << INIT_CACHE_SIZE_LOG2)
};
复制代码

expand ()函数

当散列表的空间被占用超过3/4的时候，散列表会调用expand ()函数进行扩展，咱们来看一下expand ()函数内散列表如何进行扩展的。

void cache_t::expand()
{
    cacheUpdateLock.assertLocked();
    // 获取旧的散列表的存储空间
    uint32_t oldCapacity = capacity();
    // 将旧的散列表存储空间扩容至两倍
    uint32_t newCapacity = oldCapacity ? oldCapacity*2 : INIT_CACHE_SIZE;
    // 为新的存储空间赋值
    if ((uint32_t)(mask_t)newCapacity != newCapacity) {
        newCapacity = oldCapacity;
    }
    // 调用reallocate函数，从新建立存储空间
    reallocate(oldCapacity, newCapacity);
}
复制代码

上述源码中能够发现散列表进行扩容时会将容量增至以前的2倍。

find 函数

最后来看一下散列表中如何快速的经过key找到相应的bucket呢？咱们来到find函数内部

bucket_t * cache_t::find(cache_key_t k, id receiver)
{
    assert(k != 0);
    // 获取散列表
    bucket_t *b = buckets();
    // 获取mask
    mask_t m = mask();
    // 经过key找到key在散列表中存储的下标
    mask_t begin = cache_hash(k, m);
    // 将下标赋值给i
    mask_t i = begin;
    // 若是下标i中存储的bucket的key==0说明当前没有存储相应的key，将b[i]返回出去进行存储
    // 若是下标i中存储的bucket的key==k，说明当前空间内已经存储了相应key，将b[i]返回出去进行存储
    do {
        if (b[i].key() == 0  ||  b[i].key() == k) {
            // 若是知足条件则直接reutrn出去
            return &b[i];
        }
    // 若是走到这里说明上面不知足，那么会往前移动一个空间从新进行断定，知道能够成功return为止
    } while ((i = cache_next(i, m)) != begin);

    // hack
    Class cls = (Class)((uintptr_t)this - offsetof(objc_class, cache));
    cache_t::bad_cache(receiver, (SEL)k, cls);
}
复制代码

函数cache_hash (k, m)用来经过key找到方法在散列表中存储的下标，来到cache_hash (k, m)函数内部

static inline mask_t cache_hash(cache_key_t key, mask_t mask) 
{
    return (mask_t)(key & mask);
}
复制代码

能够发现cache_hash (k, m)函数内部仅仅是进行了key & mask的按位与运算，获得下标即存储在相应的位置上。按位与运算在上文中已详细讲解过，这里不在赘述。

_mask

经过上面的分析咱们知道_mask的值是散列表的长度减一，那么任何数经过与_mask进行按位与运算以后得到的值都会小于等于_mask，所以不会出现数组溢出的状况。

举个例子，假设散列表的长度为8，那么mask的值为7

0101 1011  // 任意值
& 0000 0111  // mask = 7
------------
  0000 0011 //获取的值始终等于或小于mask的值
复制代码

总结

当第一次使用方法时，消息机制经过isa找到方法以后，会对方法以SEL为keyIMP为value的方式缓存在cache的_buckets中，当第一次存储的时候，会建立具备4个空间的散列表，并将_mask的值置为散列表的长度减一，以后经过SEL & mask计算出方法存储的下标值，并将方法存储在散列表中。举个例子，若是计算出下标值为3，那么就将方法直接存储在下标为3的空间中，前面的空间会留空。

当散列表中存储的方法占据散列表长度超过3/4的时候，散列表会进行扩容操做，将建立一个新的散列表而且空间扩容至原来空间的两倍，并重置_mask的值，最后释放旧的散列表，此时再有方法要进行缓存的话，就须要从新经过SEL & mask计算出下标值以后在按照下标进行存储了。

若是一个类中方法不少，其中极可能会出现多个方法的SEL & mask获得的值为同一个下标值，那么会调用cache_next函数往下标值-1位去进行存储，若是下标值-1位空间中有存储方法，而且key不与要存储的key相同，那么再到前面一位进行比较，直到找到一位空间没有存储方法或者key与要存储的key相同为止，若是到下标0的话就会到下标为_mask的空间也就是最大空间处进行比较。

当要查找方法时，并不须要遍历散列表，一样经过SEL & mask计算出下标值，直接去下标值的空间取值便可，同上，若是下标值中存储的key与要查找的key不相同，就去前面一位查找。这样虽然占用了少许控件，可是大大节省了时间，也就是说其实apple是使用空间换取了存取的时间。

经过一张图更清晰的看一下其中的流程。

验证上述流程

经过一段代码演示一下。一样使用仿照objc_class结构体自定义一个结构体，并进行强制转化来查看其内部数据，自定义结构体在以前的文章中使用过屡次这里不在赘述。

咱们建立Person类继承NSObject，Student类继承Person，CollegeStudent继承Student。三个类分别有personTest，studentTest，colleaeStudentTest方法

经过打印断点来看一下方法缓存的过程

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        
        CollegeStudent *collegeStudent = [[CollegeStudent alloc] init];
        xx_objc_class *collegeStudentClass = (__bridge xx_objc_class *)[CollegeStudent class];
        
        cache_t cache = collegeStudentClass->cache;
        bucket_t *buckets = cache._buckets;
        
        [collegeStudent personTest];
        [collegeStudent studentTest];
        
        NSLog(@"----------------------------");
        for (int i = 0; i <= cache._mask; i++) {
            bucket_t bucket = buckets[i];
            NSLog(@"%s %p", bucket._key, bucket._imp);
        }
        NSLog(@"----------------------------");
        
        [collegeStudent colleaeStudentTest];

        cache = collegeStudentClass->cache;
        buckets = cache._buckets;
        NSLog(@"----------------------------");
        for (int i = 0; i <= cache._mask; i++) {
            bucket_t bucket = buckets[i];
            NSLog(@"%s %p", bucket._key, bucket._imp);
        }
        NSLog(@"----------------------------");
        
        NSLog(@"%p",@selector(colleaeStudentTest));
        NSLog(@"----------------------------");
    }
    return 0;
}
复制代码

咱们分别在collegeStudent实例对象调用personTest，studentTest，colleaeStudentTest方法处打断点查看cache的变化。

personTest方法调用以前

从上图中能够发现，personTest方法调用以前，cache中仅仅存储了init方法，上图中能够看出init方法刚好存储在下标为0的位置所以咱们能够看到，_mask的值为3验证咱们上述源码中提到的散列表第一次存储时会分配4个内存空间，_occupied的值为1证实此时_buckets中仅仅存储了一个方法。

当collegeStudent在调用personTest的时候，首先发现collegeStudent类对象的cache中没有personTest方法，就会去collegeStudent类对象的方法列表中查找，方法列表中也没有，那么就经过superclass指针找到Student类对象，Studeng类对象中cache和方法列表一样没有，再经过superclass指针找到Person类对象，最终在Person类对象方法列表中找到以后进行调用，并缓存在collegeStudent类对象的cache中。

执行personTest方法以后查看cache方法的变化

上图中能够发现_occupied值为2，说明此时personTest方法已经被缓存在collegeStudent类对象的cache中。

同理执行过studentTest方法以后，咱们经过打印查看一下此时cache内存储的信息

上图中能够看到cache中确实存储了 init 、personTest 、studentTest三个方法。

那么执行过colleaeStudentTest方法以后此时cache中应该对colleaeStudentTest方法进行缓存。上面源码提到过，当存储的方法数超过散列表长度的3/4时，系统会从新建立一个容量为原来两倍的新的散列表替代原来的散列表。过掉colleaeStudentTest方法，从新打印cache内存储的方法查看。

能够看出上图中_bucket散列表扩容以后仅仅存储了colleaeStudentTest方法，而且上图中打印SEL & _mask 位运算得出下标的值确实是_bucket列表中colleaeStudentTest方法存储的位置。

至此已经对Class的结构及方法缓存的过程有了新的认知，apple经过散列表的形式对方法进行缓存，以少许的空间节省了大量查找方法的时间。

底层原理文章专栏

文中若是有不对的地方欢迎指出。我是xx_cc，一只长大好久但尚未二够的家伙。须要视频一块儿探讨学习的coder能够加我Q：2336684744