带你打造一套 APM 监控系统（二）

时间 2020-07-07

标签打造一套 apm 监控系统繁體版

原文原文链接

文章将近50000字，拆分为一、二。Github 上完整文章阅读体验更佳，请点击访问 Githubphp

6、电量消耗

移动设备上电量一直是比较敏感的问题，若是用户在某款 App 的时候发现耗电量严重、手机发热严重，那么用户很大可能会立刻卸载这款 App。因此须要在开发阶段关心耗电量问题。html

通常来讲遇到耗电量较大，咱们立马会想到是否是使用了定位、是否是使用了频繁网络请求、是否是不断循环作某件事情？前端

开发阶段基本没啥问题，咱们能够结合 Instrucments 里的 Energy Log 工具来定位问题。可是线上问题就须要代码去监控耗电量，能够做为 APM 的能力之一。node

1. 如何获取电量

在 iOS 中，IOKit 是一个私有框架，用来获取硬件和设备的详细信息，也是硬件和内核服务通讯的底层框架。因此咱们能够经过 IOKit来获取硬件信息，从而获取到电量信息。步骤以下：react

首先在苹果开放源代码 opensource 中找到 IOPowerSources.h、IOPSKeys.h。在 Xcode 的 Package Contents 里面找到 IOKit.framework。路径为 /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk/System/Library/Frameworks/IOKit.framework
而后将 IOPowerSources.h、IOPSKeys.h、IOKit.framework 导入项目工程
设置 UIDevice 的 batteryMonitoringEnabled 为 true
获取到的耗电量精确度为 1%

2. 定位问题

一般咱们经过 Instrucments 里的 Energy Log 解决了不少问题后，App 上线了，线上的耗电量解决就须要使用 APM 来解决了。耗电地方多是二方库、三方库，也多是某个同事的代码。android

思路是：在检测到耗电后，先找到有问题的线程，而后堆栈 dump，还原案发现场。webpack

在上面部分咱们知道了线程信息的结构， thread_basic_info 中有个记录 CPU 使用率百分比的字段 cpu_usage。因此咱们能够经过遍历当前线程，判断哪一个线程的 CPU 使用率较高，从而找出有问题的线程。而后再 dump 堆栈，从而定位到发生耗电量的代码。详细请看 3.2 部分。ios

- (double)fetchBatteryCostUsage
{
  // returns a blob of power source information in an opaque CFTypeRef
    CFTypeRef blob = IOPSCopyPowerSourcesInfo();
    // returns a CFArray of power source handles, each of type CFTypeRef
    CFArrayRef sources = IOPSCopyPowerSourcesList(blob);
    CFDictionaryRef pSource = NULL;
    const void *psValue;
    // returns the number of values currently in an array
    int numOfSources = CFArrayGetCount(sources);
    // error in CFArrayGetCount
    if (numOfSources == 0) {
        NSLog(@"Error in CFArrayGetCount");
        return -1.0f;
    }

    // calculating the remaining energy
    for (int i=0; i<numOfSources; i++) {
        // returns a CFDictionary with readable information about the specific power source
        pSource = IOPSGetPowerSourceDescription(blob, CFArrayGetValueAtIndex(sources, i));
        if (!pSource) {
            NSLog(@"Error in IOPSGetPowerSourceDescription");
            return -1.0f;
        }
        psValue = (CFStringRef) CFDictionaryGetValue(pSource, CFSTR(kIOPSNameKey));

        int curCapacity = 0;
        int maxCapacity = 0;
        double percentage;

        psValue = CFDictionaryGetValue(pSource, CFSTR(kIOPSCurrentCapacityKey));
        CFNumberGetValue((CFNumberRef)psValue, kCFNumberSInt32Type, &curCapacity);

        psValue = CFDictionaryGetValue(pSource, CFSTR(kIOPSMaxCapacityKey));
        CFNumberGetValue((CFNumberRef)psValue, kCFNumberSInt32Type, &maxCapacity);

        percentage = ((double) curCapacity / (double) maxCapacity * 100.0f);
        NSLog(@"curCapacity : %d / maxCapacity: %d , percentage: %.1f ", curCapacity, maxCapacity, percentage);
        return percentage;
    }
    return -1.0f;
}
复制代码

3. 开发阶段针对电量消耗咱们能作什么

CPU 密集运算是耗电量主要缘由。因此咱们对 CPU 的使用须要精打细算。尽可能避免让 CPU 作无用功。对于大量数据的复杂运算，能够借助服务器的能力、GPU 的能力。若是方案设计必须是在 CPU 上完成数据的运算，则能够利用 GCD 技术，使用 dispatch_block_create_with_qos_class(<#dispatch_block_flags_t flags#>, dispatch_qos_class_t qos_class, <#int relative_priority#>, <#^(void)block#>)() 并指定队列的 qos 为 QOS_CLASS_UTILITY。将任务提交到这个队列的 block 中，在 QOS_CLASS_UTILITY 模式下，系统针对大量数据的计算，作了电量优化c++

除了 CPU 大量运算，I/O 操做也是耗电主要缘由。业界常见方案都是将「碎片化的数据写入磁盘存储」这个操做延后，先在内存中聚合吗，而后再进行磁盘存储。碎片化数据先聚合，在内存中进行存储的机制，iOS 提供 NSCache 这个对象。git

NSCache 是线程安全的，NSCache 会在达到达预设的缓存空间的条件时清理缓存，此时会触发 - (**void**)cache:(NSCache *)cache willEvictObject:(**id**)obj; 方法回调，在该方法内部对数据进行 I/O 操做，达到将聚合的数据 I/O 延后的目的。I/O 次数少了，对电量的消耗也就减小了。

NSCache 的使用能够查看 SDWebImage 这个图片加载框架。在图片读取缓存处理时，没直接读取硬盘文件（I/O），而是使用系统的 NSCache。

- (nullable UIImage *)imageFromMemoryCacheForKey:(nullable NSString *)key {
    return [self.memoryCache objectForKey:key];
}

- (nullable UIImage *)imageFromDiskCacheForKey:(nullable NSString *)key {
    UIImage *diskImage = [self diskImageForKey:key];
    if (diskImage && self.config.shouldCacheImagesInMemory) {
        NSUInteger cost = diskImage.sd_memoryCost;
        [self.memoryCache setObject:diskImage forKey:key cost:cost];
    }

    return diskImage;
}
复制代码

能够看到主要逻辑是先从磁盘中读取图片，若是配置容许开启内存缓存，则将图片保存到 NSCache 中，使用的时候也是从 NSCache 中读取图片。NSCache 的 totalCostLimit、countLimit 属性，

- (void)setObject:(ObjectType)obj forKey:(KeyType)key cost:(NSUInteger)g; 方法用来设置缓存条件。因此咱们写磁盘、内存的文件操做时能够借鉴该策略，以优化耗电量。

7、 Crash 监控

1. 异常相关知识回顾

1.1 Mach 层对异常的处理

Mach 在消息传递基础上实现了一套独特的异常处理方法。Mach 异常处理在设计时考虑到：

带有一致的语义的单一异常处理设施：Mach 只提供一个异常处理机制用于处理全部类型的异常（包括用户定义的异常、平台无关的异常以及平台特定的异常）。根据异常类型进行分组，具体的平台能够定义具体的子类型。
清晰和简洁：异常处理的接口依赖于 Mach 已有的具备良好定义的消息和端口架构，所以很是优雅（不会影响效率）。这就容许调试器和外部处理程序的拓展-甚至在理论上还支持拓展基于网络的异常处理。

在 Mach 中，异常是经过内核中的基础设施-消息传递机制处理的。一个异常并不比一条消息复杂多少，异常由出错的线程或者任务（经过 msg_send()）抛出，而后由一个处理程序经过 msg_recv()）捕捉。处理程序能够处理异常，也能够清楚异常（将异常标记为已完成并继续），还能够决定终止线程。

Mach 的异常处理模型和其余的异常处理模型不一样，其余模型的异常处理程序运行在出错的线程上下文中，而 Mach 的异常处理程序在不一样的上下文中运行异常处理程序，出错的线程向预先指定好的异常端口发送消息，而后等待应答。每个任务均可以注册一个异常处理端口，这个异常处理端口会对该任务中的全部线程生效。此外，每一个线程均可以经过 thread_set_exception_ports(<#thread_act_t thread#>, <#exception_mask_t exception_mask#>, <#mach_port_t new_port#>, <#exception_behavior_t behavior#>, <#thread_state_flavor_t new_flavor#>) 注册本身的异常处理端口。一般状况下，任务和线程的异常端口都是 NULL，也就是异常不会被处理，而一旦建立异常端口，这些端口就像系统中的其余端口同样，能够转交给其余任务或者其余主机。（有了端口，就可使用 UDP 协议，经过网络能力让其余的主机上应用程序处理异常）。

发生异常时，首先尝试将异常抛给线程的异常端口，而后尝试抛给任务的异常端口，最后再抛给主机的异常端口（即主机注册的默认端口）。若是没有一个端口返回 KERN_SUCCESS，那么整个任务将被终止。也就是 Mach 不提供异常处理逻辑，只提供传递异常通知的框架。

异常首先是由处理器陷阱引起的。为了处理陷阱，每个现代的内核都会安插陷阱处理程序。这些底层函数是由内核的汇编部分安插的。

1.2 BSD 层对异常的处理

BSD 层是用户态主要使用的 XUN 接口，这一层展现了一个符合 POSIX 标准的接口。开发者可使用 UNIX 系统的一切功能，但不须要了解 Mach 层的细节实现。

Mach 已经经过异常机制提供了底层的陷进处理，而 BSD 则在异常机制之上构建了信号处理机制。硬件产生的信号被 Mach 层捕捉，而后转换为对应的 UNIX 信号，为了维护一个统一的机制，操做系统和用户产生的信号首先被转换为 Mach 异常，而后再转换为信号。

Mach 异常都在 host 层被 ux_exception 转换为相应的 unix 信号，并经过 threadsignal 将信号投递到出错的线程。

2. Crash 收集方式

iOS 系统自带的 Apples`s Crash Reporter 在设置中记录 Crash 日志，咱们先观察下 Crash 日志

Incident Identifier: 7FA6736D-09E8-47A1-95EC-76C4522BDE1A
CrashReporter Key:   4e2d36419259f14413c3229e8b7235bcc74847f3
Hardware Model:      iPhone7,1
Process:         CMMonitorExample [3608]
Path:            /var/containers/Bundle/Application/9518A4F4-59B7-44E9-BDDA-9FBEE8CA18E5/CMMonitorExample.app/CMMonitorExample
Identifier:      com.Wacai.CMMonitorExample
Version:         1.0 (1)
Code Type:       ARM-64
Parent Process:  ? [1]

Date/Time:       2017-01-03 11:43:03.000 +0800
OS Version:      iOS 10.2 (14C92)
Report Version:  104

Exception Type:  EXC_CRASH (SIGABRT)
Exception Codes: 0x00000000 at 0x0000000000000000
Crashed Thread:  0

Application Specific Information:
*** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '-[__NSSingleObjectArrayI objectForKey:]: unrecognized selector sent to instance 0x174015060'

Thread 0 Crashed:
0   CoreFoundation                  0x0000000188f291b8 0x188df9000 + 1245624 (<redacted> + 124)
1   libobjc.A.dylib                 0x000000018796055c 0x187958000 + 34140 (objc_exception_throw + 56)
2   CoreFoundation                  0x0000000188f30268 0x188df9000 + 1274472 (<redacted> + 140)
3   CoreFoundation                  0x0000000188f2d270 0x188df9000 + 1262192 (<redacted> + 916)
4   CoreFoundation                  0x0000000188e2680c 0x188df9000 + 186380 (_CF_forwarding_prep_0 + 92)
5   CMMonitorExample                0x000000010004c618 0x100044000 + 34328 (-[MakeCrashHandler throwUncaughtNSException] + 80)
复制代码

会发现，Crash 日志中 Exception Type 项由2部分组成：Mach 异常 + Unix 信号。

因此 Exception Type: EXC_CRASH (SIGABRT) 表示：Mach 层发生了 EXC_CRASH 异常，在 host 层被转换为 SIGABRT 信号投递到出错的线程。

问题： 捕获 Mach 层异常、注册 Unix 信号处理均可以捕获 Crash，这两种方式如何选择？

答：优选 Mach 层异常拦截。根据上面 1.2 中的描述咱们知道 Mach 层异常处理时机更早些，假如 Mach 层异常处理程序让进程退出，这样 Unix 信号永远不会发生了。

业界关于崩溃日志的收集开源项目不少，著名的有： KSCrash、plcrashreporter，提供一条龙服务的 Bugly、友盟等。咱们通常使用开源项目在此基础上开发成符合公司内部需求的 bug 收集工具。一番对比后选择 KSCrash。为何选择 KSCrash 不在本文重点。

KSCrash 功能齐全，能够捕获以下类型的 Crash

Mach kernel exceptions
Fatal signals
C++ exceptions
Objective-C exceptions
Main thread deadlock (experimental)
Custom crashes (e.g. from scripting languages)

因此分析 iOS 端的 Crash 收集方案也就是分析 KSCrash 的 Crash 监控实现原理。

2.1. Mach 层异常处理

大致思路是：先建立一个异常处理端口，为该端口申请权限，再设置异常端口、新建一个内核线程，在该线程内循环等待异常。可是为了防止本身注册的 Mach 层异常处理抢占了其余 SDK、或者业务线开发者设置的逻辑，咱们须要在最开始保存其余的异常处理端口，等逻辑执行完后将异常处理交给其余的端口内的逻辑处理。收集到 Crash 信息后组装数据，写入 json 文件。

流程图以下：

对于 Mach 异常捕获，能够注册一个异常端口，该端口负责对当前任务的全部线程进行监听。

下面来看看关键代码:

static bool installExceptionHandler()
{
    KSLOG_DEBUG("Installing mach exception handler.");

    bool attributes_created = false;
    pthread_attr_t attr;

    kern_return_t kr;
    int error;
    // 拿到当前进程
    const task_t thisTask = mach_task_self();
    exception_mask_t mask = EXC_MASK_BAD_ACCESS |
    EXC_MASK_BAD_INSTRUCTION |
    EXC_MASK_ARITHMETIC |
    EXC_MASK_SOFTWARE |
    EXC_MASK_BREAKPOINT;

    KSLOG_DEBUG("Backing up original exception ports.");
    // 获取该 Task 上的注册好的异常端口
    kr = task_get_exception_ports(thisTask,
                                  mask,
                                  g_previousExceptionPorts.masks,
                                  &g_previousExceptionPorts.count,
                                  g_previousExceptionPorts.ports,
                                  g_previousExceptionPorts.behaviors,
                                  g_previousExceptionPorts.flavors);
    // 获取失败走 failed 逻辑
    if(kr != KERN_SUCCESS)
    {
        KSLOG_ERROR("task_get_exception_ports: %s", mach_error_string(kr));
        goto failed;
    }
    // KSCrash 的异常为空则走执行逻辑
    if(g_exceptionPort == MACH_PORT_NULL)
    {
        KSLOG_DEBUG("Allocating new port with receive rights.");
        // 申请异常处理端口
        kr = mach_port_allocate(thisTask,
                                MACH_PORT_RIGHT_RECEIVE,
                                &g_exceptionPort);
        if(kr != KERN_SUCCESS)
        {
            KSLOG_ERROR("mach_port_allocate: %s", mach_error_string(kr));
            goto failed;
        }

        KSLOG_DEBUG("Adding send rights to port.");
        // 为异常处理端口申请权限：MACH_MSG_TYPE_MAKE_SEND
        kr = mach_port_insert_right(thisTask,
                                    g_exceptionPort,
                                    g_exceptionPort,
                                    MACH_MSG_TYPE_MAKE_SEND);
        if(kr != KERN_SUCCESS)
        {
            KSLOG_ERROR("mach_port_insert_right: %s", mach_error_string(kr));
            goto failed;
        }
    }

    KSLOG_DEBUG("Installing port as exception handler.");
    // 为该 Task 设置异常处理端口
    kr = task_set_exception_ports(thisTask,
                                  mask,
                                  g_exceptionPort,
                                  EXCEPTION_DEFAULT,
                                  THREAD_STATE_NONE);
    if(kr != KERN_SUCCESS)
    {
        KSLOG_ERROR("task_set_exception_ports: %s", mach_error_string(kr));
        goto failed;
    }

    KSLOG_DEBUG("Creating secondary exception thread (suspended).");
    pthread_attr_init(&attr);
    attributes_created = true;
    pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
    // 设置监控线程
    error = pthread_create(&g_secondaryPThread,
                           &attr,
                           &handleExceptions,
                           kThreadSecondary);
    if(error != 0)
    {
        KSLOG_ERROR("pthread_create_suspended_np: %s", strerror(error));
        goto failed;
    }
    // 转换为 Mach 内核线程
    g_secondaryMachThread = pthread_mach_thread_np(g_secondaryPThread);
    ksmc_addReservedThread(g_secondaryMachThread);

    KSLOG_DEBUG("Creating primary exception thread.");
    error = pthread_create(&g_primaryPThread,
                           &attr,
                           &handleExceptions,
                           kThreadPrimary);
    if(error != 0)
    {
        KSLOG_ERROR("pthread_create: %s", strerror(error));
        goto failed;
    }
    pthread_attr_destroy(&attr);
    g_primaryMachThread = pthread_mach_thread_np(g_primaryPThread);
    ksmc_addReservedThread(g_primaryMachThread);
    
    KSLOG_DEBUG("Mach exception handler installed.");
    return true;


failed:
    KSLOG_DEBUG("Failed to install mach exception handler.");
    if(attributes_created)
    {
        pthread_attr_destroy(&attr);
    }
    // 还原以前的异常注册端口，将控制权还原
    uninstallExceptionHandler();
    return false;
}
复制代码

处理异常的逻辑、组装崩溃信息

/** Our exception handler thread routine.
 * Wait for an exception message, uninstall our exception port, record the
 * exception information, and write a report.
 */
static void* handleExceptions(void* const userData)
{
    MachExceptionMessage exceptionMessage = {{0}};
    MachReplyMessage replyMessage = {{0}};
    char* eventID = g_primaryEventID;

    const char* threadName = (const char*) userData;
    pthread_setname_np(threadName);
    if(threadName == kThreadSecondary)
    {
        KSLOG_DEBUG("This is the secondary thread. Suspending.");
        thread_suspend((thread_t)ksthread_self());
        eventID = g_secondaryEventID;
    }
    // 循环读取注册好的异常端口信息
    for(;;)
    {
        KSLOG_DEBUG("Waiting for mach exception");

        // Wait for a message.
        kern_return_t kr = mach_msg(&exceptionMessage.header,
                                    MACH_RCV_MSG,
                                    0,
                                    sizeof(exceptionMessage),
                                    g_exceptionPort,
                                    MACH_MSG_TIMEOUT_NONE,
                                    MACH_PORT_NULL);
        // 获取到信息后则表明发生了 Mach 层异常，跳出 for 循环，组装数据
        if(kr == KERN_SUCCESS)
        {
            break;
        }

        // Loop and try again on failure.
        KSLOG_ERROR("mach_msg: %s", mach_error_string(kr));
    }

    KSLOG_DEBUG("Trapped mach exception code 0x%x, subcode 0x%x",
                exceptionMessage.code[0], exceptionMessage.code[1]);
    if(g_isEnabled)
    {
        // 挂起全部线程
        ksmc_suspendEnvironment();
        g_isHandlingCrash = true;
        // 通知发生了异常
        kscm_notifyFatalExceptionCaptured(true);

        KSLOG_DEBUG("Exception handler is installed. Continuing exception handling.");


        // Switch to the secondary thread if necessary, or uninstall the handler
        // to avoid a death loop.
        if(ksthread_self() == g_primaryMachThread)
        {
            KSLOG_DEBUG("This is the primary exception thread. Activating secondary thread.");
// TODO: This was put here to avoid a freeze. Does secondary thread ever fire?
            restoreExceptionPorts();
            if(thread_resume(g_secondaryMachThread) != KERN_SUCCESS)
            {
                KSLOG_DEBUG("Could not activate secondary thread. Restoring original exception ports.");
            }
        }
        else
        {
            KSLOG_DEBUG("This is the secondary exception thread. Restoring original exception ports.");
//            restoreExceptionPorts();
        }

        // Fill out crash information
        // 组装异常所须要的方案现场信息
        KSLOG_DEBUG("Fetching machine state.");
        KSMC_NEW_CONTEXT(machineContext);
        KSCrash_MonitorContext* crashContext = &g_monitorContext;
        crashContext->offendingMachineContext = machineContext;
        kssc_initCursor(&g_stackCursor, NULL, NULL);
        if(ksmc_getContextForThread(exceptionMessage.thread.name, machineContext, true))
        {
            kssc_initWithMachineContext(&g_stackCursor, 100, machineContext);
            KSLOG_TRACE("Fault address 0x%x, instruction address 0x%x", kscpu_faultAddress(machineContext), kscpu_instructionAddress(machineContext));
            if(exceptionMessage.exception == EXC_BAD_ACCESS)
            {
                crashContext->faultAddress = kscpu_faultAddress(machineContext);
            }
            else
            {
                crashContext->faultAddress = kscpu_instructionAddress(machineContext);
            }
        }

        KSLOG_DEBUG("Filling out context.");
        crashContext->crashType = KSCrashMonitorTypeMachException;
        crashContext->eventID = eventID;
        crashContext->registersAreValid = true;
        crashContext->mach.type = exceptionMessage.exception;
        crashContext->mach.code = exceptionMessage.code[0];
        crashContext->mach.subcode = exceptionMessage.code[1];
        if(crashContext->mach.code == KERN_PROTECTION_FAILURE && crashContext->isStackOverflow)
        {
            // A stack overflow should return KERN_INVALID_ADDRESS, but
            // when a stack blasts through the guard pages at the top of the stack,
            // it generates KERN_PROTECTION_FAILURE. Correct for this.
            crashContext->mach.code = KERN_INVALID_ADDRESS;
        }
        crashContext->signal.signum = signalForMachException(crashContext->mach.type, crashContext->mach.code);
        crashContext->stackCursor = &g_stackCursor;

        kscm_handleException(crashContext);

        KSLOG_DEBUG("Crash handling complete. Restoring original handlers.");
        g_isHandlingCrash = false;
        ksmc_resumeEnvironment();
    }

    KSLOG_DEBUG("Replying to mach exception message.");
    // Send a reply saying "I didn't handle this exception".
    replyMessage.header = exceptionMessage.header;
    replyMessage.NDR = exceptionMessage.NDR;
    replyMessage.returnCode = KERN_FAILURE;

    mach_msg(&replyMessage.header,
             MACH_SEND_MSG,
             sizeof(replyMessage),
             0,
             MACH_PORT_NULL,
             MACH_MSG_TIMEOUT_NONE,
             MACH_PORT_NULL);

    return NULL;
}
复制代码

还原异常处理端口，转移控制权

/** Restore the original mach exception ports.
 */
static void restoreExceptionPorts(void)
{
    KSLOG_DEBUG("Restoring original exception ports.");
    if(g_previousExceptionPorts.count == 0)
    {
        KSLOG_DEBUG("Original exception ports were already restored.");
        return;
    }

    const task_t thisTask = mach_task_self();
    kern_return_t kr;

    // Reinstall old exception ports.
    // for 循环去除保存好的在 KSCrash 以前注册好的异常端口，将每一个端口注册回去
    for(mach_msg_type_number_t i = 0; i < g_previousExceptionPorts.count; i++)
    {
        KSLOG_TRACE("Restoring port index %d", i);
        kr = task_set_exception_ports(thisTask,
                                      g_previousExceptionPorts.masks[i],
                                      g_previousExceptionPorts.ports[i],
                                      g_previousExceptionPorts.behaviors[i],
                                      g_previousExceptionPorts.flavors[i]);
        if(kr != KERN_SUCCESS)
        {
            KSLOG_ERROR("task_set_exception_ports: %s",
                        mach_error_string(kr));
        }
    }
    KSLOG_DEBUG("Exception ports restored.");
    g_previousExceptionPorts.count = 0;
}
复制代码

2.2. Signal 异常处理

对于 Mach 异常，操做系统会将其转换为对应的 Unix 信号，因此开发者能够经过注册 signanHandler 的方式来处理。

KSCrash 在这里的处理逻辑以下图：

看一下关键代码:

设置信号处理函数

static bool installSignalHandler()
{
    KSLOG_DEBUG("Installing signal handler.");

#if KSCRASH_HAS_SIGNAL_STACK
    // 在堆上分配一块内存，
    if(g_signalStack.ss_size == 0)
    {
        KSLOG_DEBUG("Allocating signal stack area.");
        g_signalStack.ss_size = SIGSTKSZ;
        g_signalStack.ss_sp = malloc(g_signalStack.ss_size);
    }
    // 信号处理函数的栈挪到堆中，而不和进程共用一块栈区
    // sigaltstack() 函数，该函数的第 1 个参数 sigstack 是一个 stack_t 结构的指针，该结构存储了一个“可替换信号栈” 的位置及属性信息。第 2 个参数 old_sigstack 也是一个 stack_t 类型指针，它用来返回上一次创建的“可替换信号栈”的信息(若是有的话)
    KSLOG_DEBUG("Setting signal stack area.");
    // sigaltstack 第一个参数为建立的新的可替换信号栈，第二个参数能够设置为NULL，若是不为NULL的话，将会将旧的可替换信号栈的信息保存在里面。函数成功返回0，失败返回-1.
    if(sigaltstack(&g_signalStack, NULL) != 0)
    {
        KSLOG_ERROR("signalstack: %s", strerror(errno));
        goto failed;
    }
#endif

    const int* fatalSignals = kssignal_fatalSignals();
    int fatalSignalsCount = kssignal_numFatalSignals();

    if(g_previousSignalHandlers == NULL)
    {
        KSLOG_DEBUG("Allocating memory to store previous signal handlers.");
        g_previousSignalHandlers = malloc(sizeof(*g_previousSignalHandlers)
                                          * (unsigned)fatalSignalsCount);
    }

    // 设置信号处理函数 sigaction 的第二个参数，类型为 sigaction 结构体
    struct sigaction action = {{0}};
    // sa_flags 成员设立 SA_ONSTACK 标志，该标志告诉内核信号处理函数的栈帧就在“可替换信号栈”上创建。
    action.sa_flags = SA_SIGINFO | SA_ONSTACK;
#if KSCRASH_HOST_APPLE && defined(__LP64__)
    action.sa_flags |= SA_64REGSET;
#endif
    sigemptyset(&action.sa_mask);
    action.sa_sigaction = &handleSignal;

    // 遍历须要处理的信号数组
    for(int i = 0; i < fatalSignalsCount; i++)
    {
        // 将每一个信号的处理函数绑定到上面声明的 action 去，另外用 g_previousSignalHandlers 保存当前信号的处理函数
        KSLOG_DEBUG("Assigning handler for signal %d", fatalSignals[i]);
        if(sigaction(fatalSignals[i], &action, &g_previousSignalHandlers[i]) != 0)
        {
            char sigNameBuff[30];
            const char* sigName = kssignal_signalName(fatalSignals[i]);
            if(sigName == NULL)
            {
                snprintf(sigNameBuff, sizeof(sigNameBuff), "%d", fatalSignals[i]);
                sigName = sigNameBuff;
            }
            KSLOG_ERROR("sigaction (%s): %s", sigName, strerror(errno));
            // Try to reverse the damage
            for(i--;i >= 0; i--)
            {
                sigaction(fatalSignals[i], &g_previousSignalHandlers[i], NULL);
            }
            goto failed;
        }
    }
    KSLOG_DEBUG("Signal handlers installed.");
    return true;

failed:
    KSLOG_DEBUG("Failed to install signal handlers.");
    return false;
}
复制代码

信号处理时记录线程等上下文信息

static void handleSignal(int sigNum, siginfo_t* signalInfo, void* userContext)
{
    KSLOG_DEBUG("Trapped signal %d", sigNum);
    if(g_isEnabled)
    {
        ksmc_suspendEnvironment();
        kscm_notifyFatalExceptionCaptured(false);
        
        KSLOG_DEBUG("Filling out context.");
        KSMC_NEW_CONTEXT(machineContext);
        ksmc_getContextForSignal(userContext, machineContext);
        kssc_initWithMachineContext(&g_stackCursor, 100, machineContext);
        // 记录信号处理时的上下文信息
        KSCrash_MonitorContext* crashContext = &g_monitorContext;
        memset(crashContext, 0, sizeof(*crashContext));
        crashContext->crashType = KSCrashMonitorTypeSignal;
        crashContext->eventID = g_eventID;
        crashContext->offendingMachineContext = machineContext;
        crashContext->registersAreValid = true;
        crashContext->faultAddress = (uintptr_t)signalInfo->si_addr;
        crashContext->signal.userContext = userContext;
        crashContext->signal.signum = signalInfo->si_signo;
        crashContext->signal.sigcode = signalInfo->si_code;
        crashContext->stackCursor = &g_stackCursor;

        kscm_handleException(crashContext);
        ksmc_resumeEnvironment();
    }

    KSLOG_DEBUG("Re-raising signal for regular handlers to catch.");
    // This is technically not allowed, but it works in OSX and iOS.
    raise(sigNum);
}
复制代码

KSCrash 信号处理后还原以前的信号处理权限

static void uninstallSignalHandler(void)
{
    KSLOG_DEBUG("Uninstalling signal handlers.");

    const int* fatalSignals = kssignal_fatalSignals();
    int fatalSignalsCount = kssignal_numFatalSignals();
    // 遍历须要处理信号数组，将以前的信号处理函数还原
    for(int i = 0; i < fatalSignalsCount; i++)
    {
        KSLOG_DEBUG("Restoring original handler for signal %d", fatalSignals[i]);
        sigaction(fatalSignals[i], &g_previousSignalHandlers[i], NULL);
    }
    
    KSLOG_DEBUG("Signal handlers uninstalled.");
}
复制代码

说明：

先从堆上分配一块内存区域，被称为“可替换信号栈”，目的是将信号处理函数的栈干掉，用堆上的内存区域代替，而不和进程共用一块栈区。

为何这么作？一个进程可能有 n 个线程，每一个线程都有本身的任务，假如某个线程执行出错，这样就会致使整个进程的崩溃。因此为了信号处理函数正常运行，须要为信号处理函数设置单独的运行空间。另外一种状况是递归函数将系统默认的栈空间用尽了，可是信号处理函数使用的栈是它实如今堆中分配的空间，而不是系统默认的栈，因此它仍旧能够正常工做。
int sigaltstack(const stack_t * __restrict, stack_t * __restrict) 函数的二个参数都是 stack_t 结构的指针，存储了可替换信号栈的信息（栈的起始地址、栈的长度、状态）。第1个参数该结构存储了一个“可替换信号栈” 的位置及属性信息。第 2 个参数用来返回上一次创建的“可替换信号栈”的信息(若是有的话)。
```
_STRUCT_SIGALTSTACK
{
	void            *ss_sp;         /* signal stack base */
	__darwin_size_t ss_size;        /* signal stack length */
	int             ss_flags;       /* SA_DISABLE and/or SA_ONSTACK */
};
typedef _STRUCT_SIGALTSTACK     stack_t; /* [???] signal stack */
复制代码
```
新建立的可替换信号栈，ss_flags 必须设置为 0。系统定义了 SIGSTKSZ 常量，可知足绝大多可替换信号栈的需求。
```
/* * Structure used in sigaltstack call. */

#define SS_ONSTACK 0x0001 /* take signal on signal stack */
#define SS_DISABLE 0x0004 /* disable taking signals on alternate stack */
#define MINSIGSTKSZ 32768 /* (32K)minimum allowable stack */
#define SIGSTKSZ 131072 /* (128K)recommended stack size */
复制代码
```
sigaltstack 系统调用通知内核“可替换信号栈”已经创建。

ss_flags 为 SS_ONSTACK 时，表示进程当前正在“可替换信号栈”中执行，若是此时试图去创建一个新的“可替换信号栈”，那么会遇到 EPERM (禁止该动做) 的错误；为 SS_DISABLE 说明当前没有已创建的“可替换信号栈”，禁止创建“可替换信号栈”。
int sigaction(int, const struct sigaction * __restrict, struct sigaction * __restrict);

第一个函数表示须要处理的信号值，但不能是 SIGKILL 和 SIGSTOP ，这两个信号的处理函数不容许用户重写，由于它们给超级用户提供了终止程序的方法（ SIGKILL and SIGSTOP cannot be caught, blocked, or ignored）；

第二个和第三个参数是一个 sigaction 结构体。若是第二个参数不为空则表明将其指向信号处理函数，第三个参数不为空，则将以前的信号处理函数保存到该指针中。若是第二个参数为空，第三个参数不为空，则能够获取当前的信号处理函数。
```
/* * Signal vector "template" used in sigaction call. */
struct sigaction {
	union __sigaction_u __sigaction_u;  /* signal handler */
	sigset_t sa_mask;               /* signal mask to apply */
	int     sa_flags;               /* see signal options below */
};
复制代码
```
sigaction 函数的 sa_flags 参数须要设置 SA_ONSTACK 标志，告诉内核信号处理函数的栈帧就在“可替换信号栈”上创建。

2.3. C++ 异常处理

c++ 异常处理的实现是依靠了标准库的 std::set_terminate(CPPExceptionTerminate) 函数。

iOS 工程中某些功能的实现可能使用了C、C++等。假如抛出 C++ 异常，若是该异常能够被转换为 NSException，则走 OC 异常捕获机制，若是不能转换，则继续走 C++ 异常流程，也就是 default_terminate_handler。这个 C++ 异常的默认 terminate 函数内部调用 abort_message 函数，最后触发了一个 abort 调用，系统产生一个 SIGABRT 信号。

在系统抛出 C++ 异常后，加一层 try...catch... 来判断该异常是否能够转换为 NSException，再从新抛出的C++异常。此时异常的现场堆栈已经消失，因此上层经过捕获 SIGABRT 信号是没法还原发生异常时的场景，即异常堆栈缺失。

为何？try...catch... 语句内部会调用 __cxa_rethrow() 抛出异常，__cxa_rethrow() 内部又会调用 unwind，unwind 能够简单理解为函数调用的逆调用，主要用来清理函数调用过程当中每一个函数生成的局部变量，一直到最外层的 catch 语句所在的函数，并把控制移交给 catch 语句，这就是C++异常的堆栈消失缘由。

static void setEnabled(bool isEnabled)
{
    if(isEnabled != g_isEnabled)
    {
        g_isEnabled = isEnabled;
        if(isEnabled)
        {
            initialize();

            ksid_generate(g_eventID);
            g_originalTerminateHandler = std::set_terminate(CPPExceptionTerminate);
        }
        else
        {
            std::set_terminate(g_originalTerminateHandler);
        }
        g_captureNextStackTrace = isEnabled;
    }
}

static void initialize()
{
    static bool isInitialized = false;
    if(!isInitialized)
    {
        isInitialized = true;
        kssc_initCursor(&g_stackCursor, NULL, NULL);
    }
}

void kssc_initCursor(KSStackCursor *cursor,
                     void (*resetCursor)(KSStackCursor*),
                     bool (*advanceCursor)(KSStackCursor*))
{
    cursor->symbolicate = kssymbolicator_symbolicate;
    cursor->advanceCursor = advanceCursor != NULL ? advanceCursor : g_advanceCursor;
    cursor->resetCursor = resetCursor != NULL ? resetCursor : kssc_resetCursor;
    cursor->resetCursor(cursor);
}
复制代码

static void CPPExceptionTerminate(void) {
    ksmc_suspendEnvironment();
    KSLOG_DEBUG("Trapped c++ exception");
    const char* name = NULL;
    std::type_info* tinfo = __cxxabiv1::__cxa_current_exception_type();
    if(tinfo != NULL)
    {
        name = tinfo->name();
    }
    
    if(name == NULL || strcmp(name, "NSException") != 0)
    {
        kscm_notifyFatalExceptionCaptured(false);
        KSCrash_MonitorContext* crashContext = &g_monitorContext;
        memset(crashContext, 0, sizeof(*crashContext));

        char descriptionBuff[DESCRIPTION_BUFFER_LENGTH];
        const char* description = descriptionBuff;
        descriptionBuff[0] = 0;

        KSLOG_DEBUG("Discovering what kind of exception was thrown.");
        g_captureNextStackTrace = false;
        try
        {
            throw;
        }
        catch(std::exception& exc)
        {
            strncpy(descriptionBuff, exc.what(), sizeof(descriptionBuff));
        }
#define CATCH_VALUE(TYPE, PRINTFTYPE) \ catch(TYPE value)\ { \ snprintf(descriptionBuff, sizeof(descriptionBuff), "%" #PRINTFTYPE, value); \ }
        CATCH_VALUE(char,                 d)
        CATCH_VALUE(short,                d)
        CATCH_VALUE(int,                  d)
        CATCH_VALUE(long,                ld)
        CATCH_VALUE(long long,          lld)
        CATCH_VALUE(unsigned char,        u)
        CATCH_VALUE(unsigned short,       u)
        CATCH_VALUE(unsigned int,         u)
        CATCH_VALUE(unsigned long,       lu)
        CATCH_VALUE(unsigned long long, llu)
        CATCH_VALUE(float,                f)
        CATCH_VALUE(double,               f)
        CATCH_VALUE(long double,         Lf)
        CATCH_VALUE(char*,                s)
        catch(...)
        {
            description = NULL;
        }
        g_captureNextStackTrace = g_isEnabled;

        // TODO: Should this be done here? Maybe better in the exception handler?
        KSMC_NEW_CONTEXT(machineContext);
        ksmc_getContextForThread(ksthread_self(), machineContext, true);

        KSLOG_DEBUG("Filling out context.");
        crashContext->crashType = KSCrashMonitorTypeCPPException;
        crashContext->eventID = g_eventID;
        crashContext->registersAreValid = false;
        crashContext->stackCursor = &g_stackCursor;
        crashContext->CPPException.name = name;
        crashContext->exceptionName = name;
        crashContext->crashReason = description;
        crashContext->offendingMachineContext = machineContext;

        kscm_handleException(crashContext);
    }
    else
    {
        KSLOG_DEBUG("Detected NSException. Letting the current NSException handler deal with it.");
    }
    ksmc_resumeEnvironment();

    KSLOG_DEBUG("Calling original terminate handler.");
    g_originalTerminateHandler();
}
复制代码

2.4. Objective-C 异常处理

对于 OC 层面的 NSException 异常处理较为容易，能够经过注册 NSUncaughtExceptionHandler 来捕获异常信息，经过 NSException 参数来作 Crash 信息的收集，交给数据上报组件。

static void setEnabled(bool isEnabled) {
    if(isEnabled != g_isEnabled)
    {
        g_isEnabled = isEnabled;
        if(isEnabled)
        {
            KSLOG_DEBUG(@"Backing up original handler.");
            // 记录以前的 OC 异常处理函数
            g_previousUncaughtExceptionHandler = NSGetUncaughtExceptionHandler();
            
            KSLOG_DEBUG(@"Setting new handler.");
            // 设置新的 OC 异常处理函数
            NSSetUncaughtExceptionHandler(&handleException);
            KSCrash.sharedInstance.uncaughtExceptionHandler = &handleException;
        }
        else
        {
            KSLOG_DEBUG(@"Restoring original handler.");
            NSSetUncaughtExceptionHandler(g_previousUncaughtExceptionHandler);
        }
    }
}
复制代码

2.5. 主线程死锁

主线程死锁的检测和 ANR 的检测有些相似

建立一个线程，在线程运行方法中用 do...while... 循环处理逻辑，加了 autorelease 避免内存太高

有一个 awaitingResponse 属性和 watchdogPulse 方法。watchdogPulse 主要逻辑为设置 awaitingResponse 为 YES，切换到主线程中，设置 awaitingResponse 为 NO，

- (void) watchdogPulse
{
    __block id blockSelf = self;
    self.awaitingResponse = YES;
    dispatch_async(dispatch_get_main_queue(), ^
                   {
                       [blockSelf watchdogAnswer];
                   });
}
复制代码

线程的执行方法里面不断循环，等待设置的 g_watchdogInterval 后判断 awaitingResponse 的属性值是否是初始状态的值，不然判断为死锁

- (void) runMonitor
{
    BOOL cancelled = NO;
    do
    {
        // Only do a watchdog check if the watchdog interval is > 0.
        // If the interval is <= 0, just idle until the user changes it.
        @autoreleasepool {
            NSTimeInterval sleepInterval = g_watchdogInterval;
            BOOL runWatchdogCheck = sleepInterval > 0;
            if(!runWatchdogCheck)
            {
                sleepInterval = kIdleInterval;
            }
            [NSThread sleepForTimeInterval:sleepInterval];
            cancelled = self.monitorThread.isCancelled;
            if(!cancelled && runWatchdogCheck)
            {
                if(self.awaitingResponse)
                {
                    [self handleDeadlock];
                }
                else
                {
                    [self watchdogPulse];
                }
            }
        }
    } while (!cancelled);
}
复制代码

2.6 Crash 的生成与保存

2.6.1 Crash 日志的生成逻辑

上面的部分讲过了 iOS 应用开发中的各类 crash 监控逻辑，接下来就应该分析下 crash 捕获后如何将 crash 信息记录下来，也就是保存到应用沙盒中。

拿主线程死锁这种 crash 举例子，看看 KSCrash 是如何记录 crash 信息的。

// KSCrashMonitor_Deadlock.m
- (void) handleDeadlock
{
    ksmc_suspendEnvironment();
    kscm_notifyFatalExceptionCaptured(false);

    KSMC_NEW_CONTEXT(machineContext);
    ksmc_getContextForThread(g_mainQueueThread, machineContext, false);
    KSStackCursor stackCursor;
    kssc_initWithMachineContext(&stackCursor, 100, machineContext);
    char eventID[37];
    ksid_generate(eventID);

    KSLOG_DEBUG(@"Filling out context.");
    KSCrash_MonitorContext* crashContext = &g_monitorContext;
    memset(crashContext, 0, sizeof(*crashContext));
    crashContext->crashType = KSCrashMonitorTypeMainThreadDeadlock;
    crashContext->eventID = eventID;
    crashContext->registersAreValid = false;
    crashContext->offendingMachineContext = machineContext;
    crashContext->stackCursor = &stackCursor;
    
    kscm_handleException(crashContext);
    ksmc_resumeEnvironment();

    KSLOG_DEBUG(@"Calling abort()");
    abort();
}
复制代码

其余几个 crash 也是同样，异常信息通过包装交给 kscm_handleException() 函数处理。能够看到这个函数被其余几种 crash 捕获后所调用。

/** Start general exception processing. * * @oaram context Contextual information about the exception. */
void kscm_handleException(struct KSCrash_MonitorContext* context) {
    context->requiresAsyncSafety = g_requiresAsyncSafety;
    if(g_crashedDuringExceptionHandling)
    {
        context->crashedDuringCrashHandling = true;
    }
    for(int i = 0; i < g_monitorsCount; i++)
    {
        Monitor* monitor = &g_monitors[i];
        // 判断当前的 crash 监控是开启状态
        if(isMonitorEnabled(monitor))
        {
            // 针对每种 crash 类型作一些额外的补充信息
            addContextualInfoToEvent(monitor, context);
        }
    }
    // 真正处理 crash 信息，保存 json 格式的 crash 信息
    g_onExceptionEvent(context);

    
    if(g_handlingFatalException && !g_crashedDuringExceptionHandling)
    {
        KSLOG_DEBUG("Exception is fatal. Restoring original handlers.");
        kscm_setActiveMonitors(KSCrashMonitorTypeNone);
    }
}
复制代码

g_onExceptionEvent 是一个 block，声明为 static void (*g_onExceptionEvent)(struct KSCrash_MonitorContext* monitorContext); 在 KSCrashMonitor.c 中被赋值

void kscm_setEventCallback(void (*onEvent)(struct KSCrash_MonitorContext* monitorContext))
{
    g_onExceptionEvent = onEvent;
}
复制代码

kscm_setEventCallback() 函数在 KSCrashC.c 文件中被调用

KSCrashMonitorType kscrash_install(const char* appName, const char* const installPath) {
    KSLOG_DEBUG("Installing crash reporter.");

    if(g_installed)
    {
        KSLOG_DEBUG("Crash reporter already installed.");
        return g_monitoring;
    }
    g_installed = 1;

    char path[KSFU_MAX_PATH_LENGTH];
    snprintf(path, sizeof(path), "%s/Reports", installPath);
    ksfu_makePath(path);
    kscrs_initialize(appName, path);

    snprintf(path, sizeof(path), "%s/Data", installPath);
    ksfu_makePath(path);
    snprintf(path, sizeof(path), "%s/Data/CrashState.json", installPath);
    kscrashstate_initialize(path);

    snprintf(g_consoleLogPath, sizeof(g_consoleLogPath), "%s/Data/ConsoleLog.txt", installPath);
    if(g_shouldPrintPreviousLog)
    {
        printPreviousLog(g_consoleLogPath);
    }
    kslog_setLogFilename(g_consoleLogPath, true);
    
    ksccd_init(60);
    // 设置 crash 发生时的 callback 函数
    kscm_setEventCallback(onCrash);
    KSCrashMonitorType monitors = kscrash_setMonitoring(g_monitoring);

    KSLOG_DEBUG("Installation complete.");
    return monitors;
}

/** Called when a crash occurs. * * This function gets passed as a callback to a crash handler. */
static void onCrash(struct KSCrash_MonitorContext* monitorContext) {
    KSLOG_DEBUG("Updating application state to note crash.");
    kscrashstate_notifyAppCrash();
    monitorContext->consoleLogPath = g_shouldAddConsoleLogToReport ? g_consoleLogPath : NULL;

    // 正在处理 crash 的时候，发生了再次 crash
    if(monitorContext->crashedDuringCrashHandling)
    {
        kscrashreport_writeRecrashReport(monitorContext, g_lastCrashReportFilePath);
    }
    else
    {
        // 1. 先根据当前时间建立新的 crash 的文件路径
        char crashReportFilePath[KSFU_MAX_PATH_LENGTH];
        kscrs_getNextCrashReportPath(crashReportFilePath);
        // 2. 将新生成的文件路径保存到 g_lastCrashReportFilePath
        strncpy(g_lastCrashReportFilePath, crashReportFilePath, sizeof(g_lastCrashReportFilePath));
        // 3. 将新生成的文件路径传入函数进行 crash 写入
        kscrashreport_writeStandardReport(monitorContext, crashReportFilePath);
    }
}
复制代码

接下来的函数就是具体的日志写入文件的实现。2个函数作的事情类似，都是格式化为 json 形式并写入文件。区别在于 crash 写入时若是再次发生 crash，则走简易版的写入逻辑 kscrashreport_writeRecrashReport()，不然走标准的写入逻辑 kscrashreport_writeStandardReport()。

bool ksfu_openBufferedWriter(KSBufferedWriter* writer, const char* const path, char* writeBuffer, int writeBufferLength) {
    writer->buffer = writeBuffer;
    writer->bufferLength = writeBufferLength;
    writer->position = 0;
    /* open() 的第二个参数描述的是文件操做的权限 #define O_RDONLY 0x0000 open for reading only #define O_WRONLY 0x0001 open for writing only #define O_RDWR 0x0002 open for reading and writing #define O_ACCMODE 0x0003 mask for above mode #define O_CREAT 0x0200 create if nonexistant #define O_TRUNC 0x0400 truncate to zero length #define O_EXCL 0x0800 error if already exists 0755：即用户具备读/写/执行权限，组用户和其它用户具备读写权限； 0644：即用户具备读写权限，组用户和其它用户具备只读权限； 成功则返回文件描述符，若出现则返回 -1 */
    writer->fd = open(path, O_RDWR | O_CREAT | O_EXCL, 0644);
    if(writer->fd < 0)
    {
        KSLOG_ERROR("Could not open crash report file %s: %s", path, strerror(errno));
        return false;
    }
    return true;
}
复制代码

/** * Write a standard crash report to a file. * * @param monitorContext Contextual information about the crash and environment. * The caller must fill this out before passing it in. * * @param path The file to write to. */
void kscrashreport_writeStandardReport(const struct KSCrash_MonitorContext* const monitorContext, const char* path) {
		KSLOG_INFO("Writing crash report to %s", path);
    char writeBuffer[1024];
    KSBufferedWriter bufferedWriter;

    if(!ksfu_openBufferedWriter(&bufferedWriter, path, writeBuffer, sizeof(writeBuffer)))
    {
        return;
    }

    ksccd_freeze();
    
    KSJSONEncodeContext jsonContext;
    jsonContext.userData = &bufferedWriter;
    KSCrashReportWriter concreteWriter;
    KSCrashReportWriter* writer = &concreteWriter;
    prepareReportWriter(writer, &jsonContext);

    ksjson_beginEncode(getJsonContext(writer), true, addJSONData, &bufferedWriter);

    writer->beginObject(writer, KSCrashField_Report);
    {
        writeReportInfo(writer,
                        KSCrashField_Report,
                        KSCrashReportType_Standard,
                        monitorContext->eventID,
                        monitorContext->System.processName);
        ksfu_flushBufferedWriter(&bufferedWriter);

        writeBinaryImages(writer, KSCrashField_BinaryImages);
        ksfu_flushBufferedWriter(&bufferedWriter);

        writeProcessState(writer, KSCrashField_ProcessState, monitorContext);
        ksfu_flushBufferedWriter(&bufferedWriter);

        writeSystemInfo(writer, KSCrashField_System, monitorContext);
        ksfu_flushBufferedWriter(&bufferedWriter);

        writer->beginObject(writer, KSCrashField_Crash);
        {
            writeError(writer, KSCrashField_Error, monitorContext);
            ksfu_flushBufferedWriter(&bufferedWriter);
            writeAllThreads(writer,
                            KSCrashField_Threads,
                            monitorContext,
                            g_introspectionRules.enabled);
            ksfu_flushBufferedWriter(&bufferedWriter);
        }
        writer->endContainer(writer);

        if(g_userInfoJSON != NULL)
        {
            addJSONElement(writer, KSCrashField_User, g_userInfoJSON, false);
            ksfu_flushBufferedWriter(&bufferedWriter);
        }
        else
        {
            writer->beginObject(writer, KSCrashField_User);
        }
        if(g_userSectionWriteCallback != NULL)
        {
            ksfu_flushBufferedWriter(&bufferedWriter);
            g_userSectionWriteCallback(writer);
        }
        writer->endContainer(writer);
        ksfu_flushBufferedWriter(&bufferedWriter);

        writeDebugInfo(writer, KSCrashField_Debug, monitorContext);
    }
    writer->endContainer(writer);
    
    ksjson_endEncode(getJsonContext(writer));
    ksfu_closeBufferedWriter(&bufferedWriter);
    ksccd_unfreeze();
}

/** Write a minimal crash report to a file. * * @param monitorContext Contextual information about the crash and environment. * The caller must fill this out before passing it in. * * @param path The file to write to. */
void kscrashreport_writeRecrashReport(const struct KSCrash_MonitorContext* const monitorContext, const char* path) {
  char writeBuffer[1024];
    KSBufferedWriter bufferedWriter;
    static char tempPath[KSFU_MAX_PATH_LENGTH];
    // 将传递过来的上份 crash report 文件名路径（/var/mobile/Containers/Data/Application/******/Library/Caches/KSCrash/Test/Reports/Test-report-******.json）修改成去掉 .json ，加上 .old 成为新的文件路径 /var/mobile/Containers/Data/Application/******/Library/Caches/KSCrash/Test/Reports/Test-report-******.old

    strncpy(tempPath, path, sizeof(tempPath) - 10);
    strncpy(tempPath + strlen(tempPath) - 5, ".old", 5);
    KSLOG_INFO("Writing recrash report to %s", path);

    if(rename(path, tempPath) < 0)
    {
        KSLOG_ERROR("Could not rename %s to %s: %s", path, tempPath, strerror(errno));
    }
    // 根据传入路径来打开内存写入须要的文件
    if(!ksfu_openBufferedWriter(&bufferedWriter, path, writeBuffer, sizeof(writeBuffer)))
    {
        return;
    }

    ksccd_freeze();
    // json 解析的 c 代码
    KSJSONEncodeContext jsonContext;
    jsonContext.userData = &bufferedWriter;
    KSCrashReportWriter concreteWriter;
    KSCrashReportWriter* writer = &concreteWriter;
    prepareReportWriter(writer, &jsonContext);

    ksjson_beginEncode(getJsonContext(writer), true, addJSONData, &bufferedWriter);

    writer->beginObject(writer, KSCrashField_Report);
    {
        writeRecrash(writer, KSCrashField_RecrashReport, tempPath);
        ksfu_flushBufferedWriter(&bufferedWriter);
        if(remove(tempPath) < 0)
        {
            KSLOG_ERROR("Could not remove %s: %s", tempPath, strerror(errno));
        }
        writeReportInfo(writer,
                        KSCrashField_Report,
                        KSCrashReportType_Minimal,
                        monitorContext->eventID,
                        monitorContext->System.processName);
        ksfu_flushBufferedWriter(&bufferedWriter);

        writer->beginObject(writer, KSCrashField_Crash);
        {
            writeError(writer, KSCrashField_Error, monitorContext);
            ksfu_flushBufferedWriter(&bufferedWriter);
            int threadIndex = ksmc_indexOfThread(monitorContext->offendingMachineContext,
                                                 ksmc_getThreadFromContext(monitorContext->offendingMachineContext));
            writeThread(writer,
                        KSCrashField_CrashedThread,
                        monitorContext,
                        monitorContext->offendingMachineContext,
                        threadIndex,
                        false);
            ksfu_flushBufferedWriter(&bufferedWriter);
        }
        writer->endContainer(writer);
    }
    writer->endContainer(writer);

    ksjson_endEncode(getJsonContext(writer));
    ksfu_closeBufferedWriter(&bufferedWriter);
    ksccd_unfreeze();
}
复制代码

2.6.2 Crash 日志的读取逻辑

当前 App 在 Crash 以后，KSCrash 将数据保存到 App 沙盒目录下，App 下次启动后咱们读取存储的 crash 文件，而后处理数据并上传。

App 启动后函数调用：

[KSCrashInstallation sendAllReportsWithCompletion:] -> [KSCrash sendAllReportsWithCompletion:] -> [KSCrash allReports] -> [KSCrash reportWithIntID:] ->[KSCrash loadCrashReportJSONWithID:] -> kscrs_readReport

在 sendAllReportsWithCompletion 里读取沙盒里的Crash 数据。

// 先经过读取文件夹，遍历文件夹内的文件数量来判断 crash 报告的个数
static int getReportCount()
{
    int count = 0;
    DIR* dir = opendir(g_reportsPath);
    if(dir == NULL)
    {
        KSLOG_ERROR("Could not open directory %s", g_reportsPath);
        goto done;
    }
    struct dirent* ent;
    while((ent = readdir(dir)) != NULL)
    {
        if(getReportIDFromFilename(ent->d_name) > 0)
        {
            count++;
        }
    }

done:
    if(dir != NULL)
    {
        closedir(dir);
    }
    return count;
}

// 经过 crash 文件个数、文件夹信息去遍历，一次获取到文件名（文件名的最后一部分就是 reportID），拿到 reportID 再去读取 crash 报告内的文件内容，写入数组
- (NSArray*) allReports
{
    int reportCount = kscrash_getReportCount();
    int64_t reportIDs[reportCount];
    reportCount = kscrash_getReportIDs(reportIDs, reportCount);
    NSMutableArray* reports = [NSMutableArray arrayWithCapacity:(NSUInteger)reportCount];
    for(int i = 0; i < reportCount; i++)
    {
        NSDictionary* report = [self reportWithIntID:reportIDs[i]];
        if(report != nil)
        {
            [reports addObject:report];
        }
    }
    
    return reports;
}

//  根据 reportID 找到 crash 信息
- (NSDictionary*) reportWithIntID:(int64_t) reportID
{
    NSData* jsonData = [self loadCrashReportJSONWithID:reportID];
    if(jsonData == nil)
    {
        return nil;
    }

    NSError* error = nil;
    NSMutableDictionary* crashReport = [KSJSONCodec decode:jsonData
                                                   options:KSJSONDecodeOptionIgnoreNullInArray |
                                                           KSJSONDecodeOptionIgnoreNullInObject |
                                                           KSJSONDecodeOptionKeepPartialObject
                                                     error:&error];
    if(error != nil)
    {
        KSLOG_ERROR(@"Encountered error loading crash report %" PRIx64 ": %@", reportID, error);
    }
    if(crashReport == nil)
    {
        KSLOG_ERROR(@"Could not load crash report");
        return nil;
    }
    [self doctorReport:crashReport];

    return crashReport;
}

//  reportID 读取 crash 内容并转换为 NSData 类型
- (NSData*) loadCrashReportJSONWithID:(int64_t) reportID
{
    char* report = kscrash_readReport(reportID);
    if(report != NULL)
    {
        return [NSData dataWithBytesNoCopy:report length:strlen(report) freeWhenDone:YES];
    }
    return nil;
}

// reportID 读取 crash 数据到 char 类型
char* kscrash_readReport(int64_t reportID)
{
    if(reportID <= 0)
    {
        KSLOG_ERROR("Report ID was %" PRIx64, reportID);
        return NULL;
    }

    char* rawReport = kscrs_readReport(reportID);
    if(rawReport == NULL)
    {
        KSLOG_ERROR("Failed to load report ID %" PRIx64, reportID);
        return NULL;
    }

    char* fixedReport = kscrf_fixupCrashReport(rawReport);
    if(fixedReport == NULL)
    {
        KSLOG_ERROR("Failed to fixup report ID %" PRIx64, reportID);
    }

    free(rawReport);
    return fixedReport;
}

// 多线程加锁，经过 reportID 执行 c 函数 getCrashReportPathByID，将路径设置到 path 上。而后执行 ksfu_readEntireFile 读取 crash 信息到 result
char* kscrs_readReport(int64_t reportID)
{
    pthread_mutex_lock(&g_mutex);
    char path[KSCRS_MAX_PATH_LENGTH];
    getCrashReportPathByID(reportID, path);
    char* result;
    ksfu_readEntireFile(path, &result, NULL, 2000000);
    pthread_mutex_unlock(&g_mutex);
    return result;
}

int kscrash_getReportIDs(int64_t* reportIDs, int count)
{
    return kscrs_getReportIDs(reportIDs, count);
}

int kscrs_getReportIDs(int64_t* reportIDs, int count)
{
    pthread_mutex_lock(&g_mutex);
    count = getReportIDs(reportIDs, count);
    pthread_mutex_unlock(&g_mutex);
    return count;
}
// 循环读取文件夹内容，根据 ent->d_name 调用 getReportIDFromFilename 函数，来获取 reportID，循环内部填充数组
static int getReportIDs(int64_t* reportIDs, int count)
{
    int index = 0;
    DIR* dir = opendir(g_reportsPath);
    if(dir == NULL)
    {
        KSLOG_ERROR("Could not open directory %s", g_reportsPath);
        goto done;
    }

    struct dirent* ent;
    while((ent = readdir(dir)) != NULL && index < count)
    {
        int64_t reportID = getReportIDFromFilename(ent->d_name);
        if(reportID > 0)
        {
            reportIDs[index++] = reportID;
        }
    }

    qsort(reportIDs, (unsigned)count, sizeof(reportIDs[0]), compareInt64);

done:
    if(dir != NULL)
    {
        closedir(dir);
    }
    return index;
}

// sprintf(参数1， 格式2) 函数将格式2的值返回到参数1上，而后执行 sscanf(参数1， 参数2， 参数3)，函数将字符串参数1的内容，按照参数2的格式，写入到参数3上。crash 文件命名为 "App名称-report-reportID.json"
static int64_t getReportIDFromFilename(const char* filename)
{
    char scanFormat[100];
    sprintf(scanFormat, "%s-report-%%" PRIx64 ".json", g_appName);
    
    int64_t reportID = 0;
    sscanf(filename, scanFormat, &reportID);
    return reportID;
}
复制代码

2.7 前端 js 相关的 Crash 的监控

2.7.1 JavascriptCore 异常监控

这部分简单粗暴，直接经过 JSContext 对象的 exceptionHandler 属性来监控，好比下面的代码

jsContext.exceptionHandler = ^(JSContext *context, JSValue *exception) {
    // 处理 jscore 相关的异常信息    
};
复制代码

2.7.2 h5 页面异常监控

当 h5 页面内的 Javascript 运行异常时会 window 对象会触发 ErrorEvent 接口的 error 事件，并执行 window.onerror()。

window.onerror = function (msg, url, lineNumber, columnNumber, error) {
   // 处理异常信息
};
复制代码

2.7.3 React Native 异常监控

小实验：下图是写了一个 RN Demo 工程，在 Debug Text 控件上加了事件监听代码，内部人为触发 crash

<Text style={styles.sectionTitle} onPress={()=>{1+qw;}}>Debug</Text>
复制代码

对比组1：

条件： iOS 项目 debug 模式。在 RN 端增长了异常处理的代码。

模拟器点击 command + d 调出面板，选择 Debug，打开 Chrome 浏览器， Mac 下快捷键 Command + Option + J 打开调试面板，就能够像调试 React 同样调试 RN 代码了。

查看到 crash stack 后点击能够跳转到 sourceMap 的地方。

Tips：RN 项目打 Release 包

在项目根目录下建立文件夹（ release_iOS），做为资源的输出文件夹

在终端切换到工程目录，而后执行下面的代码

react-native bundle --entry-file index.js --platform ios --dev false --bundle-output release_ios/main.jsbundle --assets-dest release_iOS --sourcemap-output release_ios/index.ios.map;
复制代码

将 release_iOS 文件夹内的 .jsbundle 和 assets 文件夹内容拖入到 iOS 工程中便可

对比组2：

条件：iOS 项目 release 模式。在 RN 端不增长异常处理代码

操做：运行 iOS 工程，点击按钮模拟 crash

现象：iOS 项目奔溃。截图以及日志以下

2020-06-22 22:26:03.318 [info][tid:main][RCTRootView.m:294] Running application todos ({
    initialProps =     {
    };
    rootTag = 1;
})
2020-06-22 22:26:03.490 [info][tid:com.facebook.react.JavaScript] Running "todos" with {"rootTag":1,"initialProps":{}}
2020-06-22 22:27:38.673 [error][tid:com.facebook.react.JavaScript] ReferenceError: Can't find variable: qw
2020-06-22 22:27:38.675 [fatal][tid:com.facebook.react.ExceptionsManagerQueue] Unhandled JS Exception: ReferenceError: Can't find variable: qw
2020-06-22 22:27:38.691300+0800 todos[16790:314161] *** Terminating app due to uncaught exception 'RCTFatalException: Unhandled JS Exception: ReferenceError: Can't find variable: qw', reason: 'Unhandled JS Exception: ReferenceError: Can't find variable: qw, stack:
onPress@397:1821
<unknown>@203:3896
_performSideEffectsForTransition@210:9689
_performSideEffectsForTransition@(null):(null)
_receiveSignal@210:8425
_receiveSignal@(null):(null)
touchableHandleResponderRelease@210:5671
touchableHandleResponderRelease@(null):(null)
onResponderRelease@203:3006
b@97:1125
S@97:1268
w@97:1322
R@97:1617
M@97:2401
forEach@(null):(null)
U@97:2201
<unknown>@97:13818
Pe@97:90199
Re@97:13478
Ie@97:13664
receiveTouches@97:14448
value@27:3544
<unknown>@27:840
value@27:2798
value@27:812
value@(null):(null)
'
*** First throw call stack:
(
	0   CoreFoundation                      0x00007fff23e3cf0e __exceptionPreprocess + 350
	1   libobjc.A.dylib                     0x00007fff50ba89b2 objc_exception_throw + 48
	2   todos                               0x00000001017b0510 RCTFormatError + 0
	3   todos                               0x000000010182d8ca -[RCTExceptionsManager reportFatal:stack:exceptionId:suppressRedBox:] + 503
	4   todos                               0x000000010182e34e -[RCTExceptionsManager reportException:] + 1658
	5   CoreFoundation                      0x00007fff23e43e8c __invoking___ + 140
	6   CoreFoundation                      0x00007fff23e41071 -[NSInvocation invoke] + 321
	7   CoreFoundation                      0x00007fff23e41344 -[NSInvocation invokeWithTarget:] + 68
	8   todos                               0x00000001017e07fa -[RCTModuleMethod invokeWithBridge:module:arguments:] + 578
	9   todos                               0x00000001017e2a84 _ZN8facebook5reactL11invokeInnerEP9RCTBridgeP13RCTModuleDatajRKN5folly7dynamicE + 246
	10  todos                               0x00000001017e280c ___ZN8facebook5react15RCTNativeModule6invokeEjON5folly7dynamicEi_block_invoke + 78
	11  libdispatch.dylib                   0x00000001025b5f11 _dispatch_call_block_and_release + 12
	12  libdispatch.dylib                   0x00000001025b6e8e _dispatch_client_callout + 8
	13  libdispatch.dylib                   0x00000001025bd6fd _dispatch_lane_serial_drain + 788
	14  libdispatch.dylib                   0x00000001025be28f _dispatch_lane_invoke + 422
	15  libdispatch.dylib                   0x00000001025c9b65 _dispatch_workloop_worker_thread + 719
	16  libsystem_pthread.dylib             0x00007fff51c08a3d _pthread_wqthread + 290
	17  libsystem_pthread.dylib             0x00007fff51c07b77 start_wqthread + 15
)
libc++abi.dylib: terminating with uncaught exception of type NSException
(lldb) 
复制代码

Tips：如何在 RN release 模式下调试（看到 js 侧的 console 信息）

在 AppDelegate.m 中引入 #import <React/RCTLog.h>
在 - (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions 中加入 RCTSetLogThreshold(RCTLogLevelTrace);

对比组3：

条件：iOS 项目 release 模式。在 RN 端增长异常处理代码。

global.ErrorUtils.setGlobalHandler((e) => {
  console.log(e);
  let message = { name: e.name,
                message: e.message,
                stack: e.stack
  };
  axios.get('http://192.168.1.100:8888/test.php', {
  	params: { 'message': JSON.stringify(message) }
  }).then(function (response) {
  		console.log(response)
  }).catch(function (error) {
  console.log(error)
  });
}, true)
复制代码

操做：运行 iOS 工程，点击按钮模拟 crash。

现象：iOS 项目不奔溃。日志信息以下，对比 bundle 包中的 js。

结论：

在 RN 项目中，若是发生了 crash 则会在 Native 侧有相应体现。若是 RN 侧写了 crash 捕获的代码，则 Native 侧不会奔溃。若是 RN 侧的 crash 没有捕获，则 Native 直接奔溃。

RN 项目写了 crash 监控，监控后将堆栈信息打印出来发现对应的 js 信息是通过 webpack 处理的，crash 分析难度很大。因此咱们针对 RN 的 crash 须要在 RN 侧写监控代码，监控后须要上报，此外针对监控后的信息须要写专门的 crash 信息还原给你，也就是 sourceMap 解析。

2.7.3.1 js 逻辑错误

写过 RN 的人都知道在 DEBUG 模式下 js 代码有问题则会产生红屏，在 RELEASE 模式下则会白屏或者闪退，为了体验和质量把控须要作异常监控。

在看 RN 源码时候发现了 ErrorUtils，看代码能够设置处理错误信息。

/** * Copyright (c) Facebook, Inc. and its affiliates. * * This source code is licensed under the MIT license found in the * LICENSE file in the root directory of this source tree. * * @format * @flow strict * @polyfill */

let _inGuard = 0;

type ErrorHandler = (error: mixed, isFatal: boolean) => void;
type Fn<Args, Return> = (...Args) => Return;

/** * This is the error handler that is called when we encounter an exception * when loading a module. This will report any errors encountered before * ExceptionsManager is configured. */
let _globalHandler: ErrorHandler = function onError( e: mixed, isFatal: boolean, ) {
  throw e;
};

/** * The particular require runtime that we are using looks for a global * `ErrorUtils` object and if it exists, then it requires modules with the * error handler specified via ErrorUtils.setGlobalHandler by calling the * require function with applyWithGuard. Since the require module is loaded * before any of the modules, this ErrorUtils must be defined (and the handler * set) globally before requiring anything. */
const ErrorUtils = {
  setGlobalHandler(fun: ErrorHandler): void {
    _globalHandler = fun;
  },
  getGlobalHandler(): ErrorHandler {
    return _globalHandler;
  },
  reportError(error: mixed): void {
    _globalHandler && _globalHandler(error, false);
  },
  reportFatalError(error: mixed): void {
    // NOTE: This has an untyped call site in Metro.
    _globalHandler && _globalHandler(error, true);
  },
  applyWithGuard<TArgs: $ReadOnlyArray<mixed>, TOut>(
    fun: Fn<TArgs, TOut>,
    context?: ?mixed,
    args?: ?TArgs,
    // Unused, but some code synced from www sets it to null.
    unused_onError?: null,
    // Some callers pass a name here, which we ignore.
    unused_name?: ?string,
  ): ?TOut {
    try {
      _inGuard++;
      // $FlowFixMe: TODO T48204745 (1) apply(context, null) is fine. (2) array -> rest array should work
      return fun.apply(context, args);
    } catch (e) {
      ErrorUtils.reportError(e);
    } finally {
      _inGuard--;
    }
    return null;
  },
  applyWithGuardIfNeeded<TArgs: $ReadOnlyArray<mixed>, TOut>(
    fun: Fn<TArgs, TOut>,
    context?: ?mixed,
    args?: ?TArgs,
  ): ?TOut {
    if (ErrorUtils.inGuard()) {
      // $FlowFixMe: TODO T48204745 (1) apply(context, null) is fine. (2) array -> rest array should work
      return fun.apply(context, args);
    } else {
      ErrorUtils.applyWithGuard(fun, context, args);
    }
    return null;
  },
  inGuard(): boolean {
    return !!_inGuard;
  },
  guard<TArgs: $ReadOnlyArray<mixed>, TOut>(
    fun: Fn<TArgs, TOut>,
    name?: ?string,
    context?: ?mixed,
  ): ?(...TArgs) => ?TOut {
    // TODO: (moti) T48204753 Make sure this warning is never hit and remove it - types
    // should be sufficient.
    if (typeof fun !== 'function') {
      console.warn('A function must be passed to ErrorUtils.guard, got ', fun);
      return null;
    }
    const guardName = name ?? fun.name ?? '<generated guard>';
    function guarded(...args: TArgs): ?TOut {
      return ErrorUtils.applyWithGuard(
        fun,
        context ?? this,
        args,
        null,
        guardName,
      );
    }

    return guarded;
  },
};

global.ErrorUtils = ErrorUtils;

export type ErrorUtilsT = typeof ErrorUtils;
复制代码

因此 RN 的异常可使用 global.ErrorUtils 来设置错误处理。举个例子

global.ErrorUtils.setGlobalHandler(e => {
   // e.name e.message e.stack
}, true);
复制代码

2.7.3.2 组件问题

其实对于 RN 的 crash 处理还有个须要注意的就是 React Error Boundaries。详细资料

过去，组件内的 JavaScript 错误会致使 React 的内部状态被破坏，而且在下一次渲染时产生可能没法追踪的错误。这些错误基本上是由较早的其余代码（非 React 组件代码）错误引发的，但 React 并无提供一种在组件中优雅处理这些错误的方式，也没法从错误中恢复。

部分 UI 的 JavaScript 错误不该该致使整个应用崩溃，为了解决这个问题，React 16 引入了一个新的概念 —— 错误边界。

错误边界是一种 React 组件，这种组件能够捕获并打印发生在其子组件树任何位置的 JavaScript 错误，而且，它会渲染出备用 UI，而不是渲染那些崩溃了的子组件树。错误边界在渲染期间、生命周期方法和整个组件树的构造函数中捕获错误。

它能捕获子组件生命周期函数中的异常，包括构造函数（constructor）和 render 函数

而不能捕获如下异常：

Event handlers（事件处理函数）
Asynchronous code（异步代码，如setTimeout、promise等）
Server side rendering（服务端渲染）
Errors thrown in the error boundary itself (rather than its children)（异常边界组件自己抛出的异常）

因此能够经过异常边界组件捕获组件生命周期内的全部异常而后渲染兜底组件，防止 App crash，提升用户体验。也可引导用户反馈问题，方便问题的排查和修复

至此 RN 的 crash 分为2种，分别是 js 逻辑错误、组件 js 错误，都已经被监控处理了。接下来就看看如何从工程化层面解决这些问题

2.7.4 RN Crash 还原

SourceMap 文件对于前端日志的解析相当重要，SourceMap 文件中各个参数和如何计算的步骤都在里面有写，能够查看这篇文章。

有了 SourceMap 文件，借助于 mozilla 的 source-map 项目，能够很好的还原 RN 的 crash 日志。

我写了个 NodeJS 脚本，代码以下

var fs = require('fs');
var sourceMap = require('source-map');
var arguments = process.argv.splice(2);

function parseJSError(aLine, aColumn) {
    fs.readFile('./index.ios.map', 'utf8', function (err, data) {
        const whatever =  sourceMap.SourceMapConsumer.with(data, null, consumer => {
            // 读取 crash 日志的行号、列号
            let parseData = consumer.originalPositionFor({
                line: parseInt(aLine),
                column: parseInt(aColumn)
            });
            // 输出到控制台
            console.log(parseData);
            // 输出到文件中
            fs.writeFileSync('./parsed.txt', JSON.stringify(parseData) + '\n', 'utf8', function(err) {  
                if(err) {  
                    console.log(err);
                }
            });
        });
    });
}

var line = arguments[0];
var column = arguments[1];
parseJSError(line, column);
复制代码

接下来作个实验，仍是上述的 todos 项目。

在 Text 的点击事件上模拟 crash

<Text style={styles.sectionTitle} onPress={()=>{1+qw;}}>Debug</Text>
复制代码

将 RN 项目打 bundle 包、产出 sourceMap 文件。执行命令,

react-native bundle --entry-file index.js --platform android --dev false --bundle-output release_ios/main.jsbundle --assets-dest release_iOS --sourcemap-output release_ios/index.android.map;
复制代码

由于高频使用，因此给 iterm2 增长 alias 别名设置，修改 .zshrc 文件

alias RNRelease='react-native bundle --entry-file index.js --platform ios --dev false --bundle-output release_ios/main.jsbundle --assets-dest release_iOS --sourcemap-output release_ios/index.ios.map;' # RN 打 Release 包
复制代码

将 js bundle 和图片资源拷贝到 Xcode 工程中
点击模拟 crash，将日志下面的行号和列号拷贝，在 Node 项目下，执行下面命令
```
node index.js 397 1822
复制代码
```
拿脚本解析好的行号、列号、文件信息去和源代码文件比较，结果很正确。

2.7.5 SourceMap 解析系统设计

目的：经过平台能够将 RN 项目线上 crash 能够还原到具体的文件、代码行数、代码列数。能够看到具体的代码，能够看到 RN stack trace、提供源文件下载功能。

打包系统下管理的服务器：
- 生产环境下打包才生成 source map 文件
- 存储打包前的全部文件（install）
开发产品侧 RN 分析界面。点击收集到的 RN crash，在详情页能够看到具体的文件、代码行数、代码列数。能够看到具体的代码，能够看到 RN stack trace、Native stack trace。（具体技术实现上面讲过了）
因为 souece map 文件较大，RN 解析过长虽然不久，可是是对计算资源的消耗，因此须要设计高效读取方式
SourceMap 在 iOS、Android 模式下不同，因此 SoureceMap 存储须要区分 os。

3. KSCrash 的使用包装

而后再封装本身的 Crash 处理逻辑。好比要作的事情就是：

继承自 KSCrashInstallation 这个抽象类，设置初始化工做（抽象类好比 NSURLProtocol 必须继承后使用），实现抽象类中的 sink 方法。

/** * Crash system installation which handles backend-specific details. * * Only one installation can be installed at a time. * * This is an abstract class. */
@interface KSCrashInstallation : NSObject
复制代码

#import "CMCrashInstallation.h"
#import <KSCrash/KSCrashInstallation+Private.h>
#import "CMCrashReporterSink.h"

@implementation CMCrashInstallation

+ (instancetype)sharedInstance {
    static CMCrashInstallation *sharedInstance = nil;
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        sharedInstance = [[CMCrashInstallation alloc] init];
    });
    return sharedInstance;
}

- (id)init {
    return [super initWithRequiredProperties: nil];
}

- (id<KSCrashReportFilter>)sink {
    CMCrashReporterSink *sink = [[CMCrashReporterSink alloc] init];
    return [sink defaultCrashReportFilterSetAppleFmt];
}

@end
复制代码

sink 方法内部的 CMCrashReporterSink 类，遵循了 KSCrashReportFilter 协议，声明了公有方法 defaultCrashReportFilterSetAppleFmt

// .h
#import <Foundation/Foundation.h>
#import <KSCrash/KSCrashReportFilter.h>

@interface CMCrashReporterSink : NSObject<KSCrashReportFilter>

- (id <KSCrashReportFilter>) defaultCrashReportFilterSetAppleFmt;

@end

// .m
#pragma mark - public Method

- (id <KSCrashReportFilter>) defaultCrashReportFilterSetAppleFmt
{
    return [KSCrashReportFilterPipeline filterWithFilters:
            [CMCrashReportFilterAppleFmt filterWithReportStyle:KSAppleReportStyleSymbolicatedSideBySide],
            self,
            nil];
}
复制代码

其中 defaultCrashReportFilterSetAppleFmt 方法内部返回了一个 KSCrashReportFilterPipeline 类方法 filterWithFilters 的结果。

CMCrashReportFilterAppleFmt 是一个继承自 KSCrashReportFilterAppleFmt 的类，遵循了 KSCrashReportFilter 协议。协议方法容许开发者处理 Crash 的数据格式。

/** Filter the specified reports.
 *
 * @param reports The reports to process.
 * @param onCompletion Block to call when processing is complete.
 */
- (void) filterReports:(NSArray*) reports
          onCompletion:(KSCrashReportFilterCompletion) onCompletion;
复制代码

#import <KSCrash/KSCrashReportFilterAppleFmt.h>

@interface CMCrashReportFilterAppleFmt : KSCrashReportFilterAppleFmt<KSCrashReportFilter>

@end
  
// .m
- (void) filterReports:(NSArray*)reports onCompletion:(KSCrashReportFilterCompletion)onCompletion
  {
    NSMutableArray* filteredReports = [NSMutableArray arrayWithCapacity:[reports count]];
    for(NSDictionary *report in reports){
      if([self majorVersion:report] == kExpectedMajorVersion){
        id monitorInfo = [self generateMonitorInfoFromCrashReport:report];
        if(monitorInfo != nil){
          [filteredReports addObject:monitorInfo];
        }
      }
    }
    kscrash_callCompletion(onCompletion, filteredReports, YES, nil);
}

/**
 @brief 获取Crash JSON中的crash时间、mach name、signal name和apple report
 */
- (NSDictionary *)generateMonitorInfoFromCrashReport:(NSDictionary *)crashReport
{
    NSDictionary *infoReport = [crashReport objectForKey:@"report"];
    // ...
    id appleReport = [self toAppleFormat:crashReport];
    
    NSMutableDictionary *info = [NSMutableDictionary dictionary];
    [info setValue:crashTime forKey:@"crashTime"];
    [info setValue:appleReport forKey:@"appleReport"];
    [info setValue:userException forKey:@"userException"];
    [info setValue:userInfo forKey:@"custom"];
    
    return [info copy];
}
复制代码

/**
 * A pipeline of filters. Reports get passed through each subfilter in order.
 *
 * Input: Depends on what's in the pipeline.
 * Output: Depends on what's in the pipeline.
 */
@interface KSCrashReportFilterPipeline : NSObject <KSCrashReportFilter>
复制代码

APM 能力中为 Crash 模块设置一个启动器。启动器内部设置 KSCrash 的初始化工做，以及触发 Crash 时候监控所需数据的组装。好比：SESSION_ID、App 启动时间、App 名称、崩溃时间、App 版本号、当前页面信息等基础信息。

/** C Function to call during a crash report to give the callee an opportunity to
 * add to the report. NULL = ignore.
 *
 * WARNING: Only call async-safe functions from this function! DO NOT call
 * Objective-C methods!!!
 */
@property(atomic,readwrite,assign) KSReportWriteCallback onCrash;
复制代码

+ (instancetype)sharedInstance
{
    static CMCrashMonitor *_sharedManager = nil;
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        _sharedManager = [[CMCrashMonitor alloc] init];
    });
    return _sharedManager;
}


#pragma mark - public Method

- (void)startMonitor
{
    CMMLog(@"crash monitor started");

#ifdef DEBUG
    BOOL _trackingCrashOnDebug = [CMMonitorConfig sharedInstance].trackingCrashOnDebug;
    if (_trackingCrashOnDebug) {
        [self installKSCrash];
    }
#else
    [self installKSCrash];
#endif
}

#pragma mark - private method

static void onCrash(const KSCrashReportWriter* writer)
{
    NSString *sessionId = [NSString stringWithFormat:@"\"%@\"", ***]];
    writer->addJSONElement(writer, "SESSION_ID", [sessionId UTF8String], true);
    
    NSString *appLaunchTime = ***;
    writer->addJSONElement(writer, "USER_APP_START_DATE", [[NSString stringWithFormat:@"\"%@\"", appLaunchTime] UTF8String], true);
    // ...
}

- (void)installKSCrash
{
    [[CMCrashInstallation sharedInstance] install];
    [[CMCrashInstallation sharedInstance] sendAllReportsWithCompletion:nil];
    [CMCrashInstallation sharedInstance].onCrash = onCrash;
    dispatch_after(dispatch_time(DISPATCH_TIME_NOW, (int64_t)(5.f * NSEC_PER_SEC)), dispatch_get_main_queue(), ^{
        _isCanAddCrashCount = NO;
    });
}
复制代码

在 installKSCrash 方法中调用了 [[CMCrashInstallation sharedInstance] sendAllReportsWithCompletion: nil]，内部实现以下

- (void) sendAllReportsWithCompletion:(KSCrashReportFilterCompletion) onCompletion
{
    NSError* error = [self validateProperties];
    if(error != nil)
    {
        if(onCompletion != nil)
        {
            onCompletion(nil, NO, error);
        }
        return;
    }

    id<KSCrashReportFilter> sink = [self sink];
    if(sink == nil)
    {
        onCompletion(nil, NO, [NSError errorWithDomain:[[self class] description]
                                                  code:0
                                           description:@"Sink was nil (subclasses must implement method \"sink\")"]);
        return;
    }
    
    sink = [KSCrashReportFilterPipeline filterWithFilters:self.prependedFilters, sink, nil];

    KSCrash* handler = [KSCrash sharedInstance];
    handler.sink = sink;
    [handler sendAllReportsWithCompletion:onCompletion];
}
复制代码

方法内部将 KSCrashInstallation 的 sink 赋值给 KSCrash 对象。内部仍是调用了 KSCrash 的 sendAllReportsWithCompletion 方法，实现以下

- (void) sendAllReportsWithCompletion:(KSCrashReportFilterCompletion) onCompletion
{
    NSArray* reports = [self allReports];
    
    KSLOG_INFO(@"Sending %d crash reports", [reports count]);
    
    [self sendReports:reports
         onCompletion:^(NSArray* filteredReports, BOOL completed, NSError* error)
     {
         KSLOG_DEBUG(@"Process finished with completion: %d", completed);
         if(error != nil)
         {
             KSLOG_ERROR(@"Failed to send reports: %@", error);
         }
         if((self.deleteBehaviorAfterSendAll == KSCDeleteOnSucess && completed) ||
            self.deleteBehaviorAfterSendAll == KSCDeleteAlways)
         {
             kscrash_deleteAllReports();
         }
         kscrash_callCompletion(onCompletion, filteredReports, completed, error);
     }];
}
复制代码

该方法内部调用了对象方法 sendReports: onCompletion:，以下所示

- (void) sendReports:(NSArray*) reports onCompletion:(KSCrashReportFilterCompletion) onCompletion
{
    if([reports count] == 0)
    {
        kscrash_callCompletion(onCompletion, reports, YES, nil);
        return;
    }
    
    if(self.sink == nil)
    {
        kscrash_callCompletion(onCompletion, reports, NO,
                                 [NSError errorWithDomain:[[self class] description]
                                                     code:0
                                              description:@"No sink set. Crash reports not sent."]);
        return;
    }
    
    [self.sink filterReports:reports
                onCompletion:^(NSArray* filteredReports, BOOL completed, NSError* error)
     {
         kscrash_callCompletion(onCompletion, filteredReports, completed, error);
     }];
}
复制代码

方法内部的 [self.sink filterReports: onCompletion: ] 实现其实就是 CMCrashInstallation 中设置的 sink getter 方法，内部返回了 CMCrashReporterSink 对象的 defaultCrashReportFilterSetAppleFmt 方法的返回值。内部实现以下

- (id <KSCrashReportFilter>) defaultCrashReportFilterSetAppleFmt
{
    return [KSCrashReportFilterPipeline filterWithFilters:
            [CMCrashReportFilterAppleFmt filterWithReportStyle:KSAppleReportStyleSymbolicatedSideBySide],
            self,
            nil];
}
复制代码

能够看到这个函数内部设置了多个 filters，其中一个就是 self，也就是 CMCrashReporterSink 对象，因此上面的 [self.sink filterReports: onCompletion:] ，也就是调用 CMCrashReporterSink 内的数据处理方法。完了以后经过 kscrash_callCompletion(onCompletion, reports, YES, nil); 告诉 KSCrash 本地保存的 Crash 日志已经处理完毕，能够删除了。

- (void)filterReports:(NSArray *)reports onCompletion:(KSCrashReportFilterCompletion)onCompletion
{
    for (NSDictionary *report in reports) {
        // 处理 Crash 数据，将数据交给统一的数据上报组件处理...
    }
    kscrash_callCompletion(onCompletion, reports, YES, nil);
}
复制代码

至此，归纳下 KSCrash 作的事情，提供各类 crash 的监控能力，在 crash 后将进程信息、基本信息、异常信息、线程信息等用 c 高效转换为 json 写入文件，App 下次启动后读取本地的 crash 文件夹中的 crash 日志，让开发者能够自定义 key、value 而后去上报日志到 APM 系统，而后删除本地 crash 文件夹中的日志。

4. 符号化

应用 crash 以后，系统会生成一份崩溃日志，存储在设置中，应用的运行状态、调用堆栈、所处线程等信息会记录在日志中。可是这些日志是地址，并不可读，因此须要进行符号化还原。

4.1 .dSYM 文件

.dSYM （debugging symbol）文件是保存十六进制函数地址映射信息的中转文件，调试信息（symbols）都包含在该文件中。Xcode 工程每次编译运行都会生成新的 .dSYM 文件。默认状况下 debug 模式时不生成 .dSYM ，能够在 Build Settings -> Build Options -> Debug Information Format 后将值 DWARF 修改成 DWARF with dSYM File，这样再次编译运行就能够生成 .dSYM 文件。

因此每次 App 打包的时候都须要保存每一个版本的 .dSYM 文件。

.dSYM 文件中包含 DWARF 信息，打开文件的包内容 Test.app.dSYM/Contents/Resources/DWARF/Test 保存的就是 DWARF 文件。

.dSYM 文件是从 Mach-O 文件中抽取调试信息而获得的文件目录，发布的时候为了安全，会把调试信息存储在单独的文件，.dSYM 实际上是一个文件目录，结构以下：

4.2 DWARF 文件

DWARF is a debugging file format used by many compilers and debuggers to support source level debugging. It addresses the requirements of a number of procedural languages, such as C, C++, and Fortran, and is designed to be extensible to other languages. DWARF is architecture independent and applicable to any processor or operating system. It is widely used on Unix, Linux and other operating systems, as well as in stand-alone environments.

DWARF 是一种调试文件格式，它被许多编译器和调试器所普遍使用以支持源代码级别的调试。它知足许多过程语言（C、C++、Fortran）的需求，它被设计为支持拓展到其余语言。DWARF 是架构独立的，适用于其余任何的处理器和操做系统。被普遍使用在 Unix、Linux 和其余的操做系统上，以及独立环境上。

DWARF 全称是 Debugging With Arbitrary Record Formats，是一种使用属性化记录格式的调试文件。

DWARF 是可执行程序与源代码关系的一个紧凑表示。

大多数现代编程语言都是块结构：每一个实体（一个类、一个函数）被包含在另外一个实体中。一个 c 程序，每一个文件可能包含多个数据定义、多个变量、多个函数，因此 DWARF 遵循这个模型，也是块结构。DWARF 里基本的描述项是调试信息项 DIE（Debugging Information Entry）。一个 DIE 有一个标签，表示这个 DIE 描述了什么以及一个填入了细节并进一步描述该项的属性列表（类比 html、xml 结构）。一个 DIE（除了最顶层的）被一个父 DIE 包含，可能存在兄弟 DIE 或者子 DIE，属性可能包含各类值：常量（好比一个函数名），变量（好比一个函数的起始地址），或对另外一个DIE的引用（好比一个函数的返回值类型）。

DWARF 文件中的数据以下：

数据列	信息说明
.debug_loc	在 DW_AT_location 属性中使用的位置列表
.debug_macinfo	宏信息
.debug_pubnames	全局对象和函数的查找表
.debug_pubtypes	全局类型的查找表
.debug_ranges	在 DW_AT_ranges 属性中使用的地址范围
.debug_str	在 .debug_info 中使用的字符串表
.debug_types	类型描述

经常使用的标记与属性以下：

数据列	信息说明
DW_TAG_class_type	表示类名称和类型信息
DW_TAG_structure_type	表示结构名称和类型信息
DW_TAG_union_type	表示联合名称和类型信息
DW_TAG_enumeration_type	表示枚举名称和类型信息
DW_TAG_typedef	表示 typedef 的名称和类型信息
DW_TAG_array_type	表示数组名称和类型信息
DW_TAG_subrange_type	表示数组的大小信息
DW_TAG_inheritance	表示继承的类名称和类型信息
DW_TAG_member	表示类的成员
DW_TAG_subprogram	表示函数的名称信息
DW_TAG_formal_parameter	表示函数的参数信息
DW_TAG_name	表示名称字符串
DW_TAG_type	表示类型信息
DW_TAG_artifical	在建立时由编译程序设置
DW_TAG_sibling	表示兄弟位置信息
DW_TAG_data_memver_location	表示位置信息
DW_TAG_virtuality	在虚拟时设置

简单看一个 DWARF 的例子：将测试工程的 .dSYM 文件夹下的 DWARF 文件用下面命令解析

dwarfdump -F --debug-info Test.app.dSYM/Contents/Resources/DWARF/Test > debug-info.txt
复制代码

打开以下

Test.app.dSYM/Contents/Resources/DWARF/Test:	file format Mach-O arm64

.debug_info contents:
0x00000000: Compile Unit: length = 0x0000004f version = 0x0004 abbr_offset = 0x0000 addr_size = 0x08 (next unit at 0x00000053)

0x0000000b: DW_TAG_compile_unit
              DW_AT_producer [DW_FORM_strp]	("Apple clang version 11.0.3 (clang-1103.0.32.62)")
              DW_AT_language [DW_FORM_data2]	(DW_LANG_ObjC)
              DW_AT_name [DW_FORM_strp]	("_Builtin_stddef_max_align_t")
              DW_AT_stmt_list [DW_FORM_sec_offset]	(0x00000000)
              DW_AT_comp_dir [DW_FORM_strp]	("/Users/lbp/Desktop/Test")
              DW_AT_APPLE_major_runtime_vers [DW_FORM_data1]	(0x02)
              DW_AT_GNU_dwo_id [DW_FORM_data8]	(0x392b5344d415340c)

0x00000027:   DW_TAG_module
                DW_AT_name [DW_FORM_strp]	("_Builtin_stddef_max_align_t")
                DW_AT_LLVM_config_macros [DW_FORM_strp]	("\"-DDEBUG=1\" \"-DOBJC_OLD_DISPATCH_PROTOTYPES=1\"")
                DW_AT_LLVM_include_path [DW_FORM_strp]	("/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/11.0.3/include")
                DW_AT_LLVM_isysroot [DW_FORM_strp]	("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk")

0x00000038:     DW_TAG_typedef
                  DW_AT_type [DW_FORM_ref4]	(0x0000004b "long double")
                  DW_AT_name [DW_FORM_strp]	("max_align_t")
                  DW_AT_decl_file [DW_FORM_data1]	("/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/11.0.3/include/__stddef_max_align_t.h")
                  DW_AT_decl_line [DW_FORM_data1]	(16)

0x00000043:     DW_TAG_imported_declaration
                  DW_AT_decl_file [DW_FORM_data1]	("/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/11.0.3/include/__stddef_max_align_t.h")
                  DW_AT_decl_line [DW_FORM_data1]	(27)
                  DW_AT_import [DW_FORM_ref_addr]	(0x0000000000000027)

0x0000004a:     NULL

0x0000004b:   DW_TAG_base_type
                DW_AT_name [DW_FORM_strp]	("long double")
                DW_AT_encoding [DW_FORM_data1]	(DW_ATE_float)
                DW_AT_byte_size [DW_FORM_data1]	(0x08)

0x00000052:   NULL
0x00000053: Compile Unit: length = 0x000183dc version = 0x0004 abbr_offset = 0x0000 addr_size = 0x08 (next unit at 0x00018433)

0x0000005e: DW_TAG_compile_unit
              DW_AT_producer [DW_FORM_strp]	("Apple clang version 11.0.3 (clang-1103.0.32.62)")
              DW_AT_language [DW_FORM_data2]	(DW_LANG_ObjC)
              DW_AT_name [DW_FORM_strp]	("Darwin")
              DW_AT_stmt_list [DW_FORM_sec_offset]	(0x000000a7)
              DW_AT_comp_dir [DW_FORM_strp]	("/Users/lbp/Desktop/Test")
              DW_AT_APPLE_major_runtime_vers [DW_FORM_data1]	(0x02)
              DW_AT_GNU_dwo_id [DW_FORM_data8]	(0xa4a1d339379e18a5)

0x0000007a:   DW_TAG_module
                DW_AT_name [DW_FORM_strp]	("Darwin")
                DW_AT_LLVM_config_macros [DW_FORM_strp]	("\"-DDEBUG=1\" \"-DOBJC_OLD_DISPATCH_PROTOTYPES=1\"")
                DW_AT_LLVM_include_path [DW_FORM_strp]	("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include")
                DW_AT_LLVM_isysroot [DW_FORM_strp]	("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk")

0x0000008b:     DW_TAG_module
                  DW_AT_name [DW_FORM_strp]	("C")
                  DW_AT_LLVM_config_macros [DW_FORM_strp]	("\"-DDEBUG=1\" \"-DOBJC_OLD_DISPATCH_PROTOTYPES=1\"")
                  DW_AT_LLVM_include_path [DW_FORM_strp]	("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include")
                  DW_AT_LLVM_isysroot [DW_FORM_strp]	("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk")

0x0000009c:       DW_TAG_module
                    DW_AT_name [DW_FORM_strp]	("fenv")
                    DW_AT_LLVM_config_macros [DW_FORM_strp]	("\"-DDEBUG=1\" \"-DOBJC_OLD_DISPATCH_PROTOTYPES=1\"")
                    DW_AT_LLVM_include_path [DW_FORM_strp]	("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include")
                    DW_AT_LLVM_isysroot [DW_FORM_strp]	("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk")

0x000000ad:         DW_TAG_enumeration_type
                      DW_AT_type [DW_FORM_ref4]	(0x00017276 "unsigned int")
                      DW_AT_byte_size [DW_FORM_data1]	(0x04)
                      DW_AT_decl_file [DW_FORM_data1]	("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/fenv.h")
                      DW_AT_decl_line [DW_FORM_data1]	(154)

0x000000b5:           DW_TAG_enumerator
                        DW_AT_name [DW_FORM_strp]	("__fpcr_trap_invalid")
                        DW_AT_const_value [DW_FORM_udata]	(256)

0x000000bc:           DW_TAG_enumerator
                        DW_AT_name [DW_FORM_strp]	("__fpcr_trap_divbyzero")
                        DW_AT_const_value [DW_FORM_udata]	(512)

0x000000c3:           DW_TAG_enumerator
                        DW_AT_name [DW_FORM_strp]	("__fpcr_trap_overflow")
                        DW_AT_const_value [DW_FORM_udata]	(1024)

0x000000ca:           DW_TAG_enumerator
                        DW_AT_name [DW_FORM_strp]	("__fpcr_trap_underflow")
// ......
0x000466ee:   DW_TAG_subprogram
                DW_AT_name [DW_FORM_strp]	("CFBridgingRetain")
                DW_AT_decl_file [DW_FORM_data1]	("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/System/Library/Frameworks/Foundation.framework/Headers/NSObject.h")
                DW_AT_decl_line [DW_FORM_data1]	(105)
                DW_AT_prototyped [DW_FORM_flag_present]	(true)
                DW_AT_type [DW_FORM_ref_addr]	(0x0000000000019155 "CFTypeRef")
                DW_AT_inline [DW_FORM_data1]	(DW_INL_inlined)

0x000466fa:     DW_TAG_formal_parameter
                  DW_AT_name [DW_FORM_strp]	("X")
                  DW_AT_decl_file [DW_FORM_data1]	("/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/System/Library/Frameworks/Foundation.framework/Headers/NSObject.h")
                  DW_AT_decl_line [DW_FORM_data1]	(105)
                  DW_AT_type [DW_FORM_ref4]	(0x00046706 "id")

0x00046705:     NULL

0x00046706:   DW_TAG_typedef
                DW_AT_type [DW_FORM_ref4]	(0x00046711 "objc_object*")
                DW_AT_name [DW_FORM_strp]	("id")
                DW_AT_decl_file [DW_FORM_data1]	("/Users/lbp/Desktop/Test/Test/NetworkAPM/NSURLResponse+cm_FetchStatusLineFromCFNetwork.m")
                DW_AT_decl_line [DW_FORM_data1]	(44)

0x00046711:   DW_TAG_pointer_type
                DW_AT_type [DW_FORM_ref4]	(0x00046716 "objc_object")

0x00046716:   DW_TAG_structure_type
                DW_AT_name [DW_FORM_strp]	("objc_object")
                DW_AT_byte_size [DW_FORM_data1]	(0x00)

0x0004671c:     DW_TAG_member
                  DW_AT_name [DW_FORM_strp]	("isa")
                  DW_AT_type [DW_FORM_ref4]	(0x00046727 "objc_class*")
                  DW_AT_data_member_location [DW_FORM_data1]	(0x00)
// ......
复制代码

这里就不粘贴所有内容了（太长了）。能够看到 DIE 包含了函数开始地址、结束地址、函数名、文件名、所在行数，对于给定的地址，找到函数开始地址、结束地址之间包含该抵制的 DIE，则能够还原函数名和文件名信息。

debug_line 能够还原文件行数等信息

dwarfdump -F --debug-line Test.app.dSYM/Contents/Resources/DWARF/Test > debug-inline.txt
复制代码

贴部分信息

Test.app.dSYM/Contents/Resources/DWARF/Test:	file format Mach-O arm64

.debug_line contents:
debug_line[0x00000000]
Line table prologue:
    total_length: 0x000000a3
         version: 4
 prologue_length: 0x0000009a
 min_inst_length: 1
max_ops_per_inst: 1
 default_is_stmt: 1
       line_base: -5
      line_range: 14
     opcode_base: 13
standard_opcode_lengths[DW_LNS_copy] = 0
standard_opcode_lengths[DW_LNS_advance_pc] = 1
standard_opcode_lengths[DW_LNS_advance_line] = 1
standard_opcode_lengths[DW_LNS_set_file] = 1
standard_opcode_lengths[DW_LNS_set_column] = 1
standard_opcode_lengths[DW_LNS_negate_stmt] = 0
standard_opcode_lengths[DW_LNS_set_basic_block] = 0
standard_opcode_lengths[DW_LNS_const_add_pc] = 0
standard_opcode_lengths[DW_LNS_fixed_advance_pc] = 1
standard_opcode_lengths[DW_LNS_set_prologue_end] = 0
standard_opcode_lengths[DW_LNS_set_epilogue_begin] = 0
standard_opcode_lengths[DW_LNS_set_isa] = 1
include_directories[  1] = "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/11.0.3/include"
file_names[  1]:
           name: "__stddef_max_align_t.h"
      dir_index: 1
       mod_time: 0x00000000
         length: 0x00000000

Address            Line   Column File   ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
0x0000000000000000      1      0      1   0             0  is_stmt end_sequence
debug_line[0x000000a7]
Line table prologue:
    total_length: 0x0000230a
         version: 4
 prologue_length: 0x00002301
 min_inst_length: 1
max_ops_per_inst: 1
 default_is_stmt: 1
       line_base: -5
      line_range: 14
     opcode_base: 13
standard_opcode_lengths[DW_LNS_copy] = 0
standard_opcode_lengths[DW_LNS_advance_pc] = 1
standard_opcode_lengths[DW_LNS_advance_line] = 1
standard_opcode_lengths[DW_LNS_set_file] = 1
standard_opcode_lengths[DW_LNS_set_column] = 1
standard_opcode_lengths[DW_LNS_negate_stmt] = 0
standard_opcode_lengths[DW_LNS_set_basic_block] = 0
standard_opcode_lengths[DW_LNS_const_add_pc] = 0
standard_opcode_lengths[DW_LNS_fixed_advance_pc] = 1
standard_opcode_lengths[DW_LNS_set_prologue_end] = 0
standard_opcode_lengths[DW_LNS_set_epilogue_begin] = 0
standard_opcode_lengths[DW_LNS_set_isa] = 1
include_directories[  1] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include"
include_directories[  2] = "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/11.0.3/include"
include_directories[  3] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/sys"
include_directories[  4] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/mach"
include_directories[  5] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/libkern"
include_directories[  6] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/architecture"
include_directories[  7] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/sys/_types"
include_directories[  8] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/_types"
include_directories[  9] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/arm"
include_directories[ 10] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/sys/_pthread"
include_directories[ 11] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/mach/arm"
include_directories[ 12] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/libkern/arm"
include_directories[ 13] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/uuid"
include_directories[ 14] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/netinet"
include_directories[ 15] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/netinet6"
include_directories[ 16] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/net"
include_directories[ 17] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/pthread"
include_directories[ 18] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/mach_debug"
include_directories[ 19] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/os"
include_directories[ 20] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/malloc"
include_directories[ 21] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/bsm"
include_directories[ 22] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/machine"
include_directories[ 23] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/mach/machine"
include_directories[ 24] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/secure"
include_directories[ 25] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/xlocale"
include_directories[ 26] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/arpa"
file_names[  1]:
           name: "fenv.h"
      dir_index: 1
       mod_time: 0x00000000
         length: 0x00000000
file_names[  2]:
           name: "stdatomic.h"
      dir_index: 2
       mod_time: 0x00000000
         length: 0x00000000
file_names[  3]:
           name: "wait.h"
      dir_index: 3
       mod_time: 0x00000000
         length: 0x00000000
// ......
Address            Line   Column File   ISA Discriminator Flags
------------------ ------ ------ ------ --- ------------- -------------
0x000000010000b588     14      0      2   0             0  is_stmt
0x000000010000b5b4     16      5      2   0             0  is_stmt prologue_end
0x000000010000b5d0     17     11      2   0             0  is_stmt
0x000000010000b5d4      0      0      2   0             0 
0x000000010000b5d8     17      5      2   0             0 
0x000000010000b5dc     17     11      2   0             0 
0x000000010000b5e8     18      1      2   0             0  is_stmt
0x000000010000b608     20      0      2   0             0  is_stmt
0x000000010000b61c     22      5      2   0             0  is_stmt prologue_end
0x000000010000b628     23      5      2   0             0  is_stmt
0x000000010000b644     24      1      2   0             0  is_stmt
0x000000010000b650     15      0      1   0             0  is_stmt
0x000000010000b65c     15     41      1   0             0  is_stmt prologue_end
0x000000010000b66c     11      0      2   0             0  is_stmt
0x000000010000b680     11     17      2   0             0  is_stmt prologue_end
0x000000010000b6a4     11     17      2   0             0  is_stmt end_sequence
debug_line[0x0000def9]
Line table prologue:
    total_length: 0x0000015a
         version: 4
 prologue_length: 0x000000eb
 min_inst_length: 1
max_ops_per_inst: 1
 default_is_stmt: 1
       line_base: -5
      line_range: 14
     opcode_base: 13
standard_opcode_lengths[DW_LNS_copy] = 0
standard_opcode_lengths[DW_LNS_advance_pc] = 1
standard_opcode_lengths[DW_LNS_advance_line] = 1
standard_opcode_lengths[DW_LNS_set_file] = 1
standard_opcode_lengths[DW_LNS_set_column] = 1
standard_opcode_lengths[DW_LNS_negate_stmt] = 0
standard_opcode_lengths[DW_LNS_set_basic_block] = 0
standard_opcode_lengths[DW_LNS_const_add_pc] = 0
standard_opcode_lengths[DW_LNS_fixed_advance_pc] = 1
standard_opcode_lengths[DW_LNS_set_prologue_end] = 0
standard_opcode_lengths[DW_LNS_set_epilogue_begin] = 0
standard_opcode_lengths[DW_LNS_set_isa] = 1
include_directories[  1] = "Test"
include_directories[  2] = "Test/NetworkAPM"
include_directories[  3] = "/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS13.5.sdk/usr/include/objc"
file_names[  1]:
           name: "AppDelegate.h"
      dir_index: 1
       mod_time: 0x00000000
         length: 0x00000000
file_names[  2]:
           name: "JMWebResourceURLProtocol.h"
      dir_index: 2
       mod_time: 0x00000000
         length: 0x00000000
file_names[  3]:
           name: "AppDelegate.m"
      dir_index: 1
       mod_time: 0x00000000
         length: 0x00000000
file_names[  4]:
           name: "objc.h"
      dir_index: 3
       mod_time: 0x00000000
         length: 0x00000000
// ......
复制代码

能够看到 debug_line 里包含了每一个代码地址对应的行数。上面贴了 AppDelegate 的部分。

4.3 symbols

在连接中，咱们将函数和变量统称为符合（Symbol），函数名或变量名就是符号名（Symbol Name），咱们能够将符号当作是连接中的粘合剂，整个连接过程正是基于符号才能正确完成的。

上述文字来自《程序员的自我修养》。因此符号就是函数、变量、类的统称。

按照类型划分，符号能够分为三类：

全局符号：目标文件外可见的符号，能够被其余目标文件所引用，或者须要其余目标文件定义
局部符号：只在目标文件内可见的符号，指只在目标文件内可见的函数和变量
调试符号：包括行号信息的调试符号信息，行号信息记录了函数和变量对应的文件和文件行号。

符号表（Symbol Table）：是内存地址与函数名、文件名、行号的映射表。每一个定义的符号都有一个对应的值得，叫作符号值（Symbol Value），对于变量和函数来讲，符号值就是地址，符号表组成以下

<起始地址> <结束地址> <函数> [<文件名：行号>]
复制代码

4.4 如何获取地址？

image 加载的时候会进行相对基地址进行重定位，而且每次加载的基地址都不同，函数栈 frame 的地址是重定位后的绝对地址，咱们要的是重定位前的相对地址。

Binary Images

拿测试工程的 crash 日志举例子，打开贴部分 Binary Images 内容

// ...
Binary Images:
0x102fe0000 - 0x102ff3fff Test arm64  <37eaa57df2523d95969e47a9a1d69ce5> /var/containers/Bundle/Application/643F0DFE-A710-4136-A278-A89D780B7208/Test.app/Test
0x1030e0000 - 0x1030ebfff libobjc-trampolines.dylib arm64  <181f3aa866d93165ac54344385ac6e1d> /usr/lib/libobjc-trampolines.dylib
0x103204000 - 0x103267fff dyld arm64  <6f1c86b640a3352a8529bca213946dd5> /usr/lib/dyld
0x189a78000 - 0x189a8efff libsystem_trace.dylib arm64  <b7477df8f6ab3b2b9275ad23c6cc0b75> /usr/lib/system/libsystem_trace.dylib
// ...
复制代码

能够看到 Crash 日志的 Binary Images 包含每一个 Image 的加载开始地址、结束地址、image 名称、arm 架构、uuid、image 路径。

crash 日志中的信息

Last Exception Backtrace:
// ...
5   Test                          	0x102fe592c -[ViewController testMonitorCrash] + 22828 (ViewController.mm:58)
复制代码

Binary Images:
0x102fe0000 - 0x102ff3fff Test arm64  <37eaa57df2523d95969e47a9a1d69ce5> /var/containers/Bundle/Application/643F0DFE-A710-4136-A278-A89D780B7208/Test.app/Test
复制代码

因此 frame 5 的相对地址为 0x102fe592c - 0x102fe0000。再使用命令能够还原符号信息。

使用 atos 来解析，0x102fe0000 为 image 加载的开始地址，0x102fe592c 为 frame 须要还原的地址。

atos -o Test.app.dSYM/Contents/Resources/DWARF/Test-arch arm64 -l 0x102fe0000 0x102fe592c
复制代码

4.5 UUID

crash 文件的 UUID

grep --after-context=2 "Binary Images:" *.crash
复制代码

Test  5-28-20, 7-47 PM.crash:Binary Images:
Test  5-28-20, 7-47 PM.crash-0x102fe0000 - 0x102ff3fff Test arm64  <37eaa57df2523d95969e47a9a1d69ce5> /var/containers/Bundle/Application/643F0DFE-A710-4136-A278-A89D780B7208/Test.app/Test
Test  5-28-20, 7-47 PM.crash-0x1030e0000 - 0x1030ebfff libobjc-trampolines.dylib arm64  <181f3aa866d93165ac54344385ac6e1d> /usr/lib/libobjc-trampolines.dylib
--
Test.crash:Binary Images:
Test.crash-0x102fe0000 - 0x102ff3fff Test arm64  <37eaa57df2523d95969e47a9a1d69ce5> /var/containers/Bundle/Application/643F0DFE-A710-4136-A278-A89D780B7208/Test.app/Test
Test.crash-0x1030e0000 - 0x1030ebfff libobjc-trampolines.dylib arm64  <181f3aa866d93165ac54344385ac6e1d> /usr/lib/libobjc-trampolines.dylib
复制代码

Test App 的 UUID 为 37eaa57df2523d95969e47a9a1d69ce5.

.dSYM 文件的 UUID

dwarfdump --uuid Test.app.dSYM
复制代码

结果为

UUID: 37EAA57D-F252-3D95-969E-47A9A1D69CE5 (arm64) Test.app.dSYM/Contents/Resources/DWARF/Test
复制代码

app 的 UUID

dwarfdump --uuid Test.app/Test
复制代码

结果为

UUID: 37EAA57D-F252-3D95-969E-47A9A1D69CE5 (arm64) Test.app/Test
复制代码

4.6 符号化（解析 Crash 日志）

上述篇幅分析了如何捕获各类类型的 crash，App 在用户手中咱们经过技术手段能够获取 crash 案发现场信息并结合必定的机制去上报，可是这种堆栈是十六进制的地址，没法定位问题，因此须要作符号化处理。

上面也说明了.dSYM 文件的做用，经过符号地址结合 dSYM 文件来还原文件名、所在行、函数名，这个过程叫符号化。可是 .dSYM 文件必须和 crash log 文件的 bundle id、version 严格对应。

获取 Crash 日志能够经过 Xcode -> Window -> Devices and Simulators 选择对应设备，找到 Crash 日志文件，根据时间和 App 名称定位。

app 和 .dSYM 文件能够经过打包的产物获得，路径为 ~/Library/Developer/Xcode/Archives。

解析方法通常有2种：

使用 symbolicatecrash

symbolicatecrash 是 Xcode 自带的 crash 日志分析工具，先肯定所在路径，在终端执行下面的命令
```
find /Applications/Xcode.app -name symbolicatecrash -type f
复制代码
```
会返回几个路径，找到 iPhoneSimulator.platform 所在那一行
```
/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/Library/PrivateFrameworks/DVTFoundation.framework/symbolicatecrash
复制代码
```
将 symbolicatecrash 拷贝到指定文件夹下（保存了 app、dSYM、crash 文件的文件夹）

执行命令
```
./symbolicatecrash Test.crash Test.dSYM > Test.crash
复制代码
```
第一次作这事儿应该会报错 Error: "DEVELOPER_DIR" is not defined at ./symbolicatecrash line 69.，解决方案：在终端执行下面命令
```
export DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer
复制代码
```
使用 atos

区别于 symbolicatecrash，atos 较为灵活，只要 .crash 和 .dSYM 或者 .crash 和 .app 文件对应便可。

用法以下，-l 最后跟得是符号地址
```
xcrun atos -o Test.app.dSYM/Contents/Resources/DWARF/Test -arch armv7 -l 0x1023c592c
复制代码
```
也能够解析 .app 文件（不存在 .dSYM 文件），其中xxx为段地址，xx为偏移地址
```
atos -arch architecture -o binary -l xxx xx
复制代码
```

由于咱们的 App 可能有不少，每一个 App 在用户手中多是不一样的版本，因此在 APM 拦截以后须要符号化的时候须要将 crash 文件和 .dSYM 文件一一对应，才能正确符号化，对应的原则就是 UUID 一致。

4.7 系统库符号化解析

咱们每次真机链接 Xcode 运行程序，会提示等待，其实系统为了堆栈解析，都会把当前版本的系统符号库自动导入到 /Users/你本身的用户名/Library/Developer/Xcode/iOS DeviceSupport 目录下安装了一大堆系统库的符号化文件。你能够访问下面目录看看

/Users/你本身的用户名/Library/Developer/Xcode/iOS DeviceSupport/
复制代码

5. 服务端处理

5.1 ELK 日志系统

业界设计日志监控系统通常会采用基于 ELK 技术。ELK 是 Elasticsearch、Logstash、Kibana 三个开源框架缩写。Elasticsearch 是一个分布式、经过 Restful 方式进行交互的近实时搜索的平台框架。Logstash 是一个中央数据流引擎，用于从不一样目标（文件/数据存储/MQ）收集不一样格式的数据，通过过滤后支持输出到不一样目的地（文件/MQ/Redis/ElasticsSearch/Kafka）。Kibana 能够将 Elasticserarch 的数据经过友好的页面展现出来，提供可视化分析功能。因此 ELK 能够搭建一个高效、企业级的日志分析系统。

早期单体应用时代，几乎应用的全部功能都在一台机器上运行，出了问题，运维人员打开终端输入命令直接查看系统日志，进而定位问题、解决问题。随着系统的功能愈来愈复杂，用户体量愈来愈大，单体应用几乎很难知足需求，因此技术架构迭代了，经过水平拓展来支持庞大的用户量，将单体应用进行拆分为多个应用，每一个应用采用集群方式部署，负载均衡控制调度，假如某个子模块发生问题，去找这台服务器上终端找日志分析吗？显然台落后，因此日志管理平台便应运而生。经过 Logstash 去收集分析每台服务器的日志文件，而后按照定义的正则模版过滤后传输到 Kafka 或 Redis，而后由另外一个 Logstash 从 Kafka 或 Redis 上读取日志存储到 ES 中建立索引，最后经过 Kibana 进行可视化分析。此外能够将收集到的数据进行数据分析，作更进一步的维护和决策。

上图展现了一个 ELK 的日志架构图。简单说明下：

Logstash 和 ES 以前存在一个 Kafka 层，由于 Logstash 是架设在数据资源服务器上，将收集到的数据进行实时过滤，过滤须要消耗时间和内存，因此存在 Kafka，起到了数据缓冲存储做用，由于 Kafka 具有很是出色的读写性能。
再一步就是 Logstash 从 Kafka 里面进行读取数据，将数据过滤、处理，将结果传输到 ES
这个设计不但性能好、耦合低，还具有可拓展性。好比能够从 n 个不一样的 Logstash 上读取传输到 n 个 Kafka 上，再由 n 个 Logstash 过滤处理。日志来源能够是 m 个，好比 App 日志、Tomcat 日志、Nginx 日志等等

下图贴一个 Elasticsearch 社区分享的一个 “Elastic APM 动手实战”主题的内容截图。

5.2 服务侧

Crash log 统一入库 Kibana 时是没有符号化的，因此须要符号化处理，以方便定位问题、crash 产生报表和后续处理。

因此整个流程就是：客户端 APM SDK 收集 crash log -> Kafka 存储 -> Mac 机执行定时任务符号化 -> 数据回传 Kafka -> 产品侧（显示端）对数据进行分类、报表、报警等操做。

由于公司的产品线有多条，相应的 App 有多个，用户使用的 App 版本也各不相同，因此 crash 日志分析必需要有正确的 .dSYM 文件，那么多 App 的不一样版本，自动化就变得很是重要了。

自动化有2种手段，规模小一点的公司或者图省事，能够在 Xcode中添加 runScript 脚本代码来自动在 release 模式下上传dSYM）。

由于咱们公司有本身的一套体系，wax-cli，能够同时管理 iOS SDK、iOS App、Android SDK、Android App、Node、React、React Native 工程项目的初始化、依赖管理、构建（持续集成、Unit Test、Lint、统跳检测）、测试、打包、部署、动态能力（热更新、统跳路由下发）等能力于一身。能够基于各个阶段作能力的插入，因此能够在调用打包后在打包机上传 .dSYM 文件到七牛云存储（规则能够是以 AppName + Version 为 key，value 为 .dSYM 文件）。

如今不少架构设计都是微服务，至于为何选微服务，不在本文范畴。因此 crash 日志的符号化被设计为一个微服务。架构图以下

说明：

Symbolication Service 做为整个监控系统 Prism 的一个组成部分，是专一于 crash report 符号化的微服务。
接收来自 mass 的包含预处理过的 crash report 和 dsym index 的请求，从七牛拉取对应的 dsym，对 crash report 作符号化解析，计算 hash，并将 hash 响应给 mass。
接收来自 Prism 管理系统的包含原始 crash report 和 dsym index 的请求，从七牛拉取对应的 dsym，对crash report 作符号化解析，并将符号化的 crash report 响应给 Prism 管理系统。
Mass 是一个通用的数据处理(流式/批式)和任务调度框架
candle 是一个打包系统，上面说的 wax-cli 有个能力就是打包，其实就是调用的 candle 系统的打包构建能力。会根据项目的特色，选择合适的打包机（打包平台是维护了多个打包任务，不一样任务根据特色被派发到不一样的打包机上，任务详情页能够看到依赖的下载、编译、运行过程等，打包好的产物包括二进制包、下载二维码等等）

其中符号化服务是大前端背景下大前端团队的产物，因此是 NodeJS 实现的。iOS 的符号化机器是双核的 Mac mini，这就须要作实验测评到底须要开启几个 worker 进程作符号化服务。结果是双进程处理 crash log，比单进程效率高近一倍，而四进程比双进程效率提高不明显，符合双核 mac mini 的特色。因此开启两个 worker 进程作符号化处理。

下图是完整设计图

简单说明下，符号化流程是一个主从模式，一台 master 机，多个 slave 机，master 机读取 .dSYM 和 crash 结果的 cache。mass 调度符号化服务（内部2个 symbolocate worker）同时从七牛云上获取 .dSYM 文件。

系统架构图以下

8、 APM 小结

一般来讲各个端的监控能力是不太一致的，技术实现细节也不统一。因此在技术方案评审的时候须要将监控能力对齐统一。每一个能力在各个端的数据字段必须对齐（字段个数、名称、数据类型和精度），由于 APM 自己是一个闭环，监控了以后需符号化解析、数据整理，进行产品化开发、最后须要监控大盘展现等
一些 crash 或者 ANR 等根据等级须要邮件、短信、企业内容通讯工具告知干系人，以后快速发布版本、hot fix 等。
监控的各个能力须要作成可配置，灵活开启关闭。
监控数据须要作内存到文件的写入处理，须要注意策略。监控数据须要存储数据库，数据库大小、设计规则等。存入数据库后如何上报，上报机制等会在另外一篇文章讲：打造一个通用、可配置的数据上报 SDK

尽可能在技术评审后，将各端的技术实现写进文档中，同步给相关人员。好比 ANR 的实现

/*
android 端

根据设备分级，通常超过 300ms 视为一次卡顿
hook 系统 loop，在消息处理先后插桩，用以计算每条消息的时长
开启另外线程 dump 堆栈，处理结束后关闭
*/
new ExceptionProcessor().init(this, new Runnable() {
            @Override
            public void run() {
                //监测卡顿
                try {
                    ProxyPrinter proxyPrinter = new ProxyPrinter(PerformanceMonitor.this);
                    Looper.getMainLooper().setMessageLogging(proxyPrinter);
                    mWeakPrinter = new WeakReference<ProxyPrinter>(proxyPrinter);
                } catch (FileNotFoundException e) {
                }
            }
        })
        
/*
iOS 端

子线程经过 ping 主线程来确认主线程当前是否卡顿。
卡顿阈值设置为 300ms，超过阈值时认为卡顿。
卡顿时获取主线程的堆栈，并存储上传。
*/ 
- (void) main() {
    while (self.cancle == NO) {
        self.isMainThreadBlocked = YES;
        dispatch_async(dispatch_get_main_queue(), ^{
            self.isMainThreadBlocked = YES;
            [self.semaphore singal];
        });
        [Thread sleep:300];
        if (self.isMainThreadBlocked) {
            [self handleMainThreadBlock];
        }
        [self.semaphore wait];
    }
}
复制代码

整个 APM 的架构图以下

说明：
- 埋点 SDK，经过 sessionId 来关联日志数据
- wax 上面介绍过了，是一种多端项目管理模式，每一个 wax 项目都具备基础信息
APM 技术方案自己是随着技术手段、分析需求不断调整升级的。上图的几个结构示意图是早期几个版本的，目前使用的是在此基础上进行了升级和结构调整，提几个关键词：Hermes、Flink SQL、InfluxDB。