三思系列：Android的消息机制，一文吃透

三思系列是我最新的学习、总结形式，着重于:问题分析、技术积累、视野拓展，关于三思系列java

此次，真的能够一文吃透：android

Java层消息队列的设计
Java层Looper分发
Native层消息队列和Java层消息队列的关系
Native层Looper分发
消息
epoll

前言

做为Android中 相当重要 的机制之一，十多年来，分析它的文章不断，大量的内容已经被挖掘过了。因此：git

已经对这一机制比较 熟稔 的读者，在这篇文章中，看不到 新东西 了。
还不太熟悉消息机制的读者，能够在文章的基础上，继续挖一挖。

可是，通过简单的检索和分析，大部分 的文章是围绕：github

Handler，Looper，MQ的关系
上层的Handler，Looper、MQ 源码分析

展开的。单纯的从这些角度学习的话，并不能 彻底理解 消息机制。web

这篇文章本质仍是 一次脑暴 ，一来 避免脑暴跑偏 ，二来帮助读者 捋清内容脉络 。先放出脑图：shell

脑暴：OS解决进程间通讯问题

程序世界中，存在着大量的 通讯 场景。搜索咱们的知识，解决 进程间通讯 问题有如下几种方式：编程

这段内容能够泛读，了解就行，不影响往下阅读markdown

管道

普通管道pipe：一种 半双工 的通讯方式，数据只能 单向流动 ，并且只能在具备 亲缘关系 的进程间使用。网络

命令流管道s_pipe: 全双工，能够同时双向传输数据结构

命名管道FIFO：半双工 的通讯方式，容许 在 无亲缘关系 的进程间通讯。

消息队列 MessageQueue：

消息的链表，存放在内核 中并由 消息队列标识符 标识。消息队列克服了 信号传递信息少、管道 只能承载 无格式字节流 以及 缓冲区大小受限 等缺点。

共享存储 SharedMemory：

映射一段 能被其余进程所访问 的内存，这段共享内存由 一个进程建立，但 多个进程均可以访问。共享内存是 最快的 IPC 方式，它是针对 其余 进程间通讯方式 运行效率低 而专门设计的。每每与其余通讯机制一同使用，如 信号量 配合使用，来实现进程间的同步和通讯。

信号量 Semaphore：

是一个 计数器 ，能够用来控制多个进程对共享资源的访问。它常做为一种 锁机制，防止某进程正在访问共享资源时，其余进程也访问该资源，实现 资源的进程独占。所以，主要做为 进程间 以及 同一进程内线程间 的同步手段。

套接字Socket：

与其余通讯机制不一样的是，它能够 经过网络 ，在 不一样机器之间 进行进程通讯。

信号 signal：

用于通知接收进程 某事件已发生。机制比较复杂。

咱们能够想象，Android之间也有大量的 进程间通讯场景，OS必须采用 至少一种 机制，以实现进程间通讯。

仔细研究下去，咱们发现，Android OS用了不止一种方式。并且，Android 还基于 OpenBinder 开发了 Binder 用于 用户空间 内的进程间通讯。

关于 为何不直接使用Linux中现有的进程间通讯方式 ，能够看看这篇知乎问答

这篇文章也简单探讨了 "内核空间内的消息队列"

这里咱们留一个问题之后探究：

Android 有没有使用 Linux内核中的MessageQueue机制干事情

基于消息队列的消息机制设计有不少优点，Android 在不少通讯场景内，采用了这一设计思路。

消息机制的三要素

无论在哪，咱们谈到消息机制，都会有这三个要素：

消息队列
消息循环（分发）
消息处理

消息队列 ，是 消息对象 的队列，基本规则是 FIFO。

消息循环（分发），基本是通用的机制，利用 死循环 不断的取出消息队列头部的消息，派发执行

消息处理，这里不得不提到 消息 有两种形式：

Enrichment 自身信息完备
Query-Back 自身信息不完备，须要回查

这二者的取舍，主要看系统中 生成消息的开销 和 回查信息的开销 二者的博弈。

在信息完备后，接收者便可处理消息。

Android Framework中的消息队列

Android 的Framework中的消息队列有两个：

Java层 frameworks/base/core/java/android/os/MessageQueue.java
Native层 frameworks/base/core/jni/android_os_MessageQueue.cpp

Java层的MQ并非 List 或者 Queue 之类的 Jdk内的数据结构实现。

Native层的源码我下载了一份 Android 10 的源码，并不长，你们能够完整的读一读。

并不难理解：用户空间 会接收到来自 内核空间 的 消息 ，从 下图 咱们可知，这部分消息先被 Native层 获知，因此：

经过 Native层 创建消息队列，它拥有消息队列的各类基本能力

利用JNI 打通 Java层 和 Native层 的 Runtime屏障，在Java层 映射 出消息队列

应用创建在Java层之上，在Java层中实现消息的 分发 和 处理

PS：在Android 2.3那个时代，消息队列的实现是在Java层的，至于10年前为什么改为了 native实现，推测和CPU空转有关，笔者没有继续探究下去，若是有读者了解，但愿能够留言帮我解惑。

PS:还有一张经典的 系统启动架构图 没有找到，这张图更加直观

代码解析

咱们简单的阅读、分析下Native中的MQ源码

Native层消息队列的建立：

static jlong android_os_MessageQueue_nativeInit(JNIEnv* env, jclass clazz) {
    NativeMessageQueue* nativeMessageQueue = new NativeMessageQueue();
    if (!nativeMessageQueue) {
        jniThrowRuntimeException(env, "Unable to allocate native queue");
        return 0;
    }

    nativeMessageQueue->incStrong(env);
    return reinterpret_cast<jlong>(nativeMessageQueue);
}
复制代码

很简单，建立一个Native层的消息队列，若是建立失败，抛异常信息,返回0，不然将指针转换为Java的long型值返回。固然，会被Java层的MQ所持有。

NativeMessageQueue 类的构造函数

NativeMessageQueue::NativeMessageQueue() :
        mPollEnv(NULL), mPollObj(NULL), mExceptionObj(NULL) {
    mLooper = Looper::getForThread();
    if (mLooper == NULL) {
        mLooper = new Looper(false);
        Looper::setForThread(mLooper);
    }
}
复制代码

这里的Looper是native层Looper，经过静态方法 Looper::getForThread() 获取对象实例，若是未获取到，则建立实例，并经过静态方法设置。

看一下Java层MQ中会使用到的native方法

class MessageQueue {
    private long mPtr; // used by native code

    private native static long nativeInit();

    private native static void nativeDestroy(long ptr);

    private native void nativePollOnce(long ptr, int timeoutMillis); /*non-static for callbacks*/

    private native static void nativeWake(long ptr);

    private native static boolean nativeIsPolling(long ptr);

    private native static void nativeSetFileDescriptorEvents(long ptr, int fd, int events);
}
复制代码

对应签名：

static const JNINativeMethod gMessageQueueMethods[] = {
    /* name, signature, funcPtr */
    { "nativeInit", "()J", (void*)android_os_MessageQueue_nativeInit },
    { "nativeDestroy", "(J)V", (void*)android_os_MessageQueue_nativeDestroy },
    { "nativePollOnce", "(JI)V", (void*)android_os_MessageQueue_nativePollOnce },
    { "nativeWake", "(J)V", (void*)android_os_MessageQueue_nativeWake },
    { "nativeIsPolling", "(J)Z", (void*)android_os_MessageQueue_nativeIsPolling },
    { "nativeSetFileDescriptorEvents", "(JII)V",
            (void*)android_os_MessageQueue_nativeSetFileDescriptorEvents },
};
复制代码

mPtr 是Native层MQ的内存地址在Java层的映射。

Java层判断MQ是否还在工做：

private boolean isPollingLocked() {
    // If the loop is quitting then it must not be idling.
    // We can assume mPtr != 0 when mQuitting is false.
    return !mQuitting && nativeIsPolling(mPtr);
}
复制代码

static jboolean android_os_MessageQueue_nativeIsPolling(JNIEnv* env, jclass clazz, jlong ptr) {
    NativeMessageQueue* nativeMessageQueue = reinterpret_cast<NativeMessageQueue*>(ptr);
    return nativeMessageQueue->getLooper()->isPolling();
}
复制代码

/** * Returns whether this looper's thread is currently polling for more work to do. * This is a good signal that the loop is still alive rather than being stuck * handling a callback. Note that this method is intrinsically racy, since the * state of the loop can change before you get the result back. */
bool isPolling() const;
复制代码

唤醒 Native层MQ：

static void android_os_MessageQueue_nativeWake(JNIEnv* env, jclass clazz, jlong ptr) {
    NativeMessageQueue* nativeMessageQueue = reinterpret_cast<NativeMessageQueue*>(ptr);
    nativeMessageQueue->wake();
}

void NativeMessageQueue::wake() {
    mLooper->wake();
}
复制代码

Native层Poll：

static void android_os_MessageQueue_nativePollOnce(JNIEnv* env, jobject obj, jlong ptr, jint timeoutMillis) {
    NativeMessageQueue* nativeMessageQueue = reinterpret_cast<NativeMessageQueue*>(ptr);
    nativeMessageQueue->pollOnce(env, obj, timeoutMillis);
}

void NativeMessageQueue::pollOnce(JNIEnv* env, jobject pollObj, int timeoutMillis) {
    mPollEnv = env;
    mPollObj = pollObj;
    mLooper->pollOnce(timeoutMillis);
    mPollObj = NULL;
    mPollEnv = NULL;

    if (mExceptionObj) {
        env->Throw(mExceptionObj);
        env->DeleteLocalRef(mExceptionObj);
        mExceptionObj = NULL;
    }
}
复制代码

这里比较重要，咱们先大概看下 Native层的Looper是 如何分发消息 的

//Looper.h

int pollOnce(int timeoutMillis, int* outFd, int* outEvents, void** outData);
inline int pollOnce(int timeoutMillis) {
    return pollOnce(timeoutMillis, NULL, NULL, NULL);
}

//实现

int Looper::pollOnce(int timeoutMillis, int* outFd, int* outEvents, void** outData) {
    int result = 0;
    for (;;) {
        while (mResponseIndex < mResponses.size()) {
            const Response& response = mResponses.itemAt(mResponseIndex++);
            int ident = response.request.ident;
            if (ident >= 0) {
                int fd = response.request.fd;
                int events = response.events;
                void* data = response.request.data;
#if DEBUG_POLL_AND_WAKE
                ALOGD("%p ~ pollOnce - returning signalled identifier %d: "
                        "fd=%d, events=0x%x, data=%p",
                        this, ident, fd, events, data);
#endif
                if (outFd != NULL) *outFd = fd;
                if (outEvents != NULL) *outEvents = events;
                if (outData != NULL) *outData = data;
                return ident;
            }
        }

        if (result != 0) {
#if DEBUG_POLL_AND_WAKE
            ALOGD("%p ~ pollOnce - returning result %d", this, result);
#endif
            if (outFd != NULL) *outFd = 0;
            if (outEvents != NULL) *outEvents = 0;
            if (outData != NULL) *outData = NULL;
            return result;
        }

        result = pollInner(timeoutMillis);
    }
}

复制代码

先处理Native层滞留的Response，而后调用pollInner。这里的细节比较复杂，稍后咱们在 Native Looper解析中进行脑暴。

先于此处细节分析，咱们知道，调用一个方法，这是阻塞的 ，用大白话描述即在方法返回前，调用者在 等待。

Java层调动 native void nativePollOnce(long ptr, int timeoutMillis); 过程当中是阻塞的。

此时咱们再阅读下Java层MQ的消息获取：代码比较长，直接在代码中进行要点注释。

在看以前，咱们先单纯从 TDD的角度 思考下，有哪些 主要场景 ：固然，这些场景不必定都合乎Android现有的设计

消息队列是否在工做中
- 工做中，指望返回消息
- 不工做，指望返回null
工做中的消息队列 当前 是否有消息
- 不存在消息，阻塞 or 返回null？-- 若是返回null，则在外部须要须要 保持空转 或者 唤醒机制，以支持正常运做。从封装角度出发，应当 保持空转，本身解决问题
- 存在消息
  - 特殊的 内部功能性消息，指望MQ内部自行处理
  - 已经处处理时间的消息， 返回消息
  - 未处处理时间，若是都是排过序的，指望 空转保持阻塞 or 返回静默并设置唤醒？按照前面的讨论，是指望 保持空转

class MessageQueue {
    Message next() {
        // Return here if the message loop has already quit and been disposed.
        // This can happen if the application tries to restart a looper after quit
        // which is not supported.
        // 1. 若是 native消息队列指针映射已经为0，即虚引用，说明消息队列已经退出，没有消息了。
        // 则返回 null
        final long ptr = mPtr;
        if (ptr == 0) {
            return null;
        }

        int pendingIdleHandlerCount = -1; // -1 only during first iteration
        int nextPollTimeoutMillis = 0;
        
        // 2. 死循环，当为获取到须要 `分发处理` 的消息时，保持空转
        for (;;) {
            if (nextPollTimeoutMillis != 0) {
                Binder.flushPendingCommands();
            }

            // 3. 调用native层方法，poll message，注意，消息还存在于native层
            nativePollOnce(ptr, nextPollTimeoutMillis);

            synchronized (this) {
                // Try to retrieve the next message. Return if found.
                final long now = SystemClock.uptimeMillis();
                Message prevMsg = null;
                Message msg = mMessages;
                
                //4. 若是发现 barrier ，即同步屏障，则寻找队列中的下一个可能存在的异步消息
                if (msg != null && msg.target == null) {
                    // Stalled by a barrier. Find the next asynchronous message in the queue.
                    do {
                        prevMsg = msg;
                        msg = msg.next;
                    } while (msg != null && !msg.isAsynchronous());
                }
                
                if (msg != null) {
                    // 5. 发现了消息，
                    // 若是是尚未到约定时间的消息，则设置一个 `下次唤醒` 的最大时间差
                    // 不然 `维护单链表信息` 并返回消息
                    
                    if (now < msg.when) {
                        // Next message is not ready. Set a timeout to wake up when it is ready.
                        nextPollTimeoutMillis = (int) Math.min(msg.when - now, Integer.MAX_VALUE);
                    } else {
                        // 寻找到了 `处处理时间` 的消息。 `维护单链表信息` 并返回消息
                        // Got a message.
                        mBlocked = false;
                        if (prevMsg != null) {
                            prevMsg.next = msg.next;
                        } else {
                            mMessages = msg.next;
                        }
                        msg.next = null;
                        if (DEBUG) Log.v(TAG, "Returning message: " + msg);
                        msg.markInUse();
                        return msg;
                    }
                } else {
                    // No more messages.
                    nextPollTimeoutMillis = -1;
                }

                // 处理 是否须要 中止消息队列 
                // Process the quit message now that all pending messages have been handled.
                if (mQuitting) {
                    dispose();
                    return null;
                }

                // 维护 接下来须要处理的 IDLEHandler 信息，
                // 若是没有 IDLEHandler，则直接进入下一轮消息获取环节
                // 不然处理 IDLEHandler
                // If first time idle, then get the number of idlers to run.
                // Idle handles only run if the queue is empty or if the first message
                // in the queue (possibly a barrier) is due to be handled in the future.
                if (pendingIdleHandlerCount < 0
                        && (mMessages == null || now < mMessages.when)) {
                    pendingIdleHandlerCount = mIdleHandlers.size();
                }
                if (pendingIdleHandlerCount <= 0) {
                    // No idle handlers to run. Loop and wait some more.
                    mBlocked = true;
                    continue;
                }

                if (mPendingIdleHandlers == null) {
                    mPendingIdleHandlers = new IdleHandler[Math.max(pendingIdleHandlerCount, 4)];
                }
                mPendingIdleHandlers = mIdleHandlers.toArray(mPendingIdleHandlers);
            }

            // 处理 IDLEHandler
            // Run the idle handlers.
            // We only ever reach this code block during the first iteration.
            for (int i = 0; i < pendingIdleHandlerCount; i++) {
                final IdleHandler idler = mPendingIdleHandlers[i];
                mPendingIdleHandlers[i] = null; // release the reference to the handler

                boolean keep = false;
                try {
                    keep = idler.queueIdle();
                } catch (Throwable t) {
                    Log.wtf(TAG, "IdleHandler threw exception", t);
                }

                if (!keep) {
                    synchronized (this) {
                        mIdleHandlers.remove(idler);
                    }
                }
            }

            // Reset the idle handler count to 0 so we do not run them again.
            pendingIdleHandlerCount = 0;

            // While calling an idle handler, a new message could have been delivered
            // so go back and look again for a pending message without waiting.
            nextPollTimeoutMillis = 0;
        }
    }
}
复制代码

Java层压入消息

这就比较简单了，当消息自己合法，且消息队列还在工做中时。依旧从 TDD角度 出发：

若是消息队列没有头，指望直接做为头
若是有头
- 消息处理时间 先于 头消息 或者是须要当即处理的消息，则做为新的头
- 不然按照 处理时间 插入到合适位置

boolean enqueueMessage(Message msg, long when) {
        if (msg.target == null) {
            throw new IllegalArgumentException("Message must have a target.");
        }

        synchronized (this) {
            if (msg.isInUse()) {
                throw new IllegalStateException(msg + " This message is already in use.");
            }

            if (mQuitting) {
                IllegalStateException e = new IllegalStateException(
                        msg.target + " sending message to a Handler on a dead thread");
                Log.w(TAG, e.getMessage(), e);
                msg.recycle();
                return false;
            }

            msg.markInUse();
            msg.when = when;
            Message p = mMessages;
            boolean needWake;
            if (p == null || when == 0 || when < p.when) {
                // New head, wake up the event queue if blocked.
                msg.next = p;
                mMessages = msg;
                needWake = mBlocked;
            } else {
                // Inserted within the middle of the queue. Usually we don't have to wake
                // up the event queue unless there is a barrier at the head of the queue
                // and the message is the earliest asynchronous message in the queue.
                needWake = mBlocked && p.target == null && msg.isAsynchronous();
                Message prev;
                for (;;) {
                    prev = p;
                    p = p.next;
                    if (p == null || when < p.when) {
                        break;
                    }
                    if (needWake && p.isAsynchronous()) {
                        needWake = false;
                    }
                }
                msg.next = p; // invariant: p == prev.next
                prev.next = msg;
            }

            // We can assume mPtr != 0 because mQuitting is false.
            if (needWake) {
                nativeWake(mPtr);
            }
        }
        return true;
    }
复制代码

同步屏障 barrier后面单独脑暴，其余部分就先不看了

Java层消息分发

这一节开始，咱们脑暴消息分发，前面咱们已经看过了 MessageQueue ，消息分发就是 不停地 从 MessageQueue 中取出消息，并指派给处理者。完成这一工做的，是Looper。

在前面，咱们已经知道了，Native层也有Looper，可是不难理解：

消息队列须要 桥梁 连通 Java层和Native层
Looper只须要 在本身这一端，处理本身的消息队列分发便可

因此，咱们看Java层的消息分发时，看Java层的Looper便可。

关注三个主要方法：

出门上班
工做
下班回家

出门上班 prepare

class Looper {

    public static void prepare() {
        prepare(true);
    }

    private static void prepare(boolean quitAllowed) {
        if (sThreadLocal.get() != null) {
            throw new RuntimeException("Only one Looper may be created per thread");
        }
        sThreadLocal.set(new Looper(quitAllowed));
    }
}
复制代码

这里有两个注意点：

已经出了门，除非再进门，不然无法再出门了。一样，一个线程有一个Looper就够了，只要它还活着，就不必再建一个。
责任到人，一个Looper服务于一个Thread，这须要 注册 ，表明着 某个Thread 已经由本身服务了。利用了ThreadLocal，由于多线程访问集合，`总须要考虑

竞争，这很不人道主义，干脆分家，每一个Thread操做本身的内容互不干扰，也就没有了竞争，因而封装了 ThreadLocal`

上班 loop

注意工做性质是 分发，并不须要本身处理

没有 注册 天然就找不到负责这份工做的人。
已经在工做了就不要催，催了会致使工做出错，顺序出现问题。
工做就是不断的取出 老板-- MQ 的 指令 -- Message，并交给 相关负责人 -- Handler 去处理，并记录信息
007，不眠不休，当MQ不再发出消息了，没活干了，你们都散了吧，下班回家

class Looper {
    public static void loop() {
        final Looper me = myLooper();
        if (me == null) {
            throw new RuntimeException("No Looper; Looper.prepare() wasn't called on this thread.");
        }
        if (me.mInLoop) {
            Slog.w(TAG, "Loop again would have the queued messages be executed"
                    + " before this one completed.");
        }

        me.mInLoop = true;
        final MessageQueue queue = me.mQueue;

        // Make sure the identity of this thread is that of the local process,
        // and keep track of what that identity token actually is.
        Binder.clearCallingIdentity();
        final long ident = Binder.clearCallingIdentity();

        // Allow overriding a threshold with a system prop. e.g.
        // adb shell 'setprop log.looper.1000.main.slow 1 && stop && start'
        final int thresholdOverride =
                SystemProperties.getInt("log.looper."
                        + Process.myUid() + "."
                        + Thread.currentThread().getName()
                        + ".slow", 0);

        boolean slowDeliveryDetected = false;

        for (;;) {
            Message msg = queue.next(); // might block
            if (msg == null) {
                // No message indicates that the message queue is quitting.
                return;
            }

            // This must be in a local variable, in case a UI event sets the logger
            final Printer logging = me.mLogging;
            if (logging != null) {
                logging.println(">>>>> Dispatching to " + msg.target + " " +
                        msg.callback + ": " + msg.what);
            }
            // Make sure the observer won't change while processing a transaction.
            final Observer observer = sObserver;

            final long traceTag = me.mTraceTag;
            long slowDispatchThresholdMs = me.mSlowDispatchThresholdMs;
            long slowDeliveryThresholdMs = me.mSlowDeliveryThresholdMs;
            if (thresholdOverride > 0) {
                slowDispatchThresholdMs = thresholdOverride;
                slowDeliveryThresholdMs = thresholdOverride;
            }
            final boolean logSlowDelivery = (slowDeliveryThresholdMs > 0) && (msg.when > 0);
            final boolean logSlowDispatch = (slowDispatchThresholdMs > 0);

            final boolean needStartTime = logSlowDelivery || logSlowDispatch;
            final boolean needEndTime = logSlowDispatch;

            if (traceTag != 0 && Trace.isTagEnabled(traceTag)) {
                Trace.traceBegin(traceTag, msg.target.getTraceName(msg));
            }

            final long dispatchStart = needStartTime ? SystemClock.uptimeMillis() : 0;
            final long dispatchEnd;
            Object token = null;
            if (observer != null) {
                token = observer.messageDispatchStarting();
            }
            long origWorkSource = ThreadLocalWorkSource.setUid(msg.workSourceUid);
            try {
                //注意这里
                msg.target.dispatchMessage(msg);
                if (observer != null) {
                    observer.messageDispatched(token, msg);
                }
                dispatchEnd = needEndTime ? SystemClock.uptimeMillis() : 0;
            } catch (Exception exception) {
                if (observer != null) {
                    observer.dispatchingThrewException(token, msg, exception);
                }
                throw exception;
            } finally {
                ThreadLocalWorkSource.restore(origWorkSource);
                if (traceTag != 0) {
                    Trace.traceEnd(traceTag);
                }
            }
            if (logSlowDelivery) {
                if (slowDeliveryDetected) {
                    if ((dispatchStart - msg.when) <= 10) {
                        Slog.w(TAG, "Drained");
                        slowDeliveryDetected = false;
                    }
                } else {
                    if (showSlowLog(slowDeliveryThresholdMs, msg.when, dispatchStart, "delivery",
                            msg)) {
                        // Once we write a slow delivery log, suppress until the queue drains.
                        slowDeliveryDetected = true;
                    }
                }
            }
            if (logSlowDispatch) {
                showSlowLog(slowDispatchThresholdMs, dispatchStart, dispatchEnd, "dispatch", msg);
            }

            if (logging != null) {
                logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
            }

            // Make sure that during the course of dispatching the
            // identity of the thread wasn't corrupted.
            final long newIdent = Binder.clearCallingIdentity();
            if (ident != newIdent) {
                Log.wtf(TAG, "Thread identity changed from 0x"
                        + Long.toHexString(ident) + " to 0x"
                        + Long.toHexString(newIdent) + " while dispatching to "
                        + msg.target.getClass().getName() + " "
                        + msg.callback + " what=" + msg.what);
            }

            msg.recycleUnchecked();
        }
    }
}

复制代码

下班 quit/quitSafely

这是比较粗暴的行为，MQ离开了Looper就无法正常工做了，即下班即意味着辞职

class Looper {
    public void quit() {
        mQueue.quit(false);
    }
    
    public void quitSafely() {
        mQueue.quit(true);
    }
}
复制代码

消息处理 Handler

这里就比较清晰了。API基本分为如下几类：

面向使用者：

建立Message，经过Message的 享元模式
发送消息，注意postRunnable也是一个消息
移除消息，
退出等

面向消息处理：

class Handler {
    /** * Subclasses must implement this to receive messages. */
    public void handleMessage(@NonNull Message msg) {
    }

    /** * Handle system messages here. * Looper分发时调用的API */
    public void dispatchMessage(@NonNull Message msg) {
        if (msg.callback != null) {
            handleCallback(msg);
        } else {
            if (mCallback != null) {
                if (mCallback.handleMessage(msg)) {
                    return;
                }
            }
            handleMessage(msg);
        }
    }
}
复制代码

若是有 Handler callback，则交给callback处理，不然本身处理，若是没覆写 handleMessage ，消息至关于被 drop 了。

消息发送部分能够结合下图梳理：

阶段性小结,至此，咱们已经对 Framework层的消息机制 有一个完整的了解了。前面咱们梳理了：

Native层和 Java层均有消息队列，而且经过JNI和指针映射，存在对应关系

Native层和 Java层MQ 消息获取时的大体过程

Java层 Looper 如何工做

Java层 Handler 大体概览

根据前面梳理的内容，能够总结：从 Java Runtime 看：

消息队列机制服务于 线程级别，即一个线程有一个工做中的消息队列便可，固然，也能够没有。

即，一个Thread 至多有 一个工做中的Looper。

Looper 和 Java层MQ 一一对应

Handler 是MQ的入口，也是 消息 的处理者

消息-- Message 应用了 享元模式，自身信息足够，知足 自洽，建立消息的开销性对较大，因此利用享元模式对消息对象进行复用。

下面咱们再继续探究细节，解决前面语焉不详处留下的疑惑：

消息的类型和本质
Native层Looper 的pollInner

消息的类型和本质

message中的几个重要成员变量：

class Message {
   
    public int what;
    
    public int arg1;
    
    public int arg2;
    
    public Object obj;

    public Messenger replyTo;

    /*package*/ int flags;
    
    public long when;

    /*package*/ Bundle data;

    /*package*/ Handler target;

    /*package*/ Runnable callback;

}
复制代码

其中 target是 目标，若是没有目标，那就是一个特殊的消息： 同步屏障 即 barrier；

what 是消息标识 arg1 和 arg2 是开销较小的 数据，若是 不足以表达信息 则能够放入 Bundle data 中。

replyTo 和 obj 是跨进程传递消息时使用的，暂且不看。

flags 是 message 的状态标识，例如 是否在使用中，是不是同步消息

上面提到的同步屏障，即 barrier，其做用是拦截后面的 同步消息 不被获取，在前面阅读Java层MQ的next方法时读到过。

咱们还记得，next方法中，使用死循环，尝试读出一个知足处理条件的消息，若是取不到，由于死循环的存在，调用者（Looper）会被一直阻塞。

此时能够印证一个结论，消息按照 功能分类 能够分为 三种：

普通消息
同步屏障消息
异步消息

其中同步消息是一种内部机制。设置屏障以后须要在合适时间取消屏障，不然会致使 普通消息永远没法被处理，而取消时，须要用到设置屏障时返回的token。

Native层Looper

相信你们都对 Native层 的Looper产生兴趣了，想看看它在Native层都干些什么。

对完整源码感兴趣的能够看这里，下面咱们节选部分进行阅读。

前面提到了Looper的pollOnce，处理完搁置的Response以后，会调用pollInner获取消息

int Looper::pollInner(int timeoutMillis) {
#if DEBUG_POLL_AND_WAKE
    ALOGD("%p ~ pollOnce - waiting: timeoutMillis=%d", this, timeoutMillis);
#endif

    // Adjust the timeout based on when the next message is due.
    if (timeoutMillis != 0 && mNextMessageUptime != LLONG_MAX) {
        nsecs_t now = systemTime(SYSTEM_TIME_MONOTONIC);
        int messageTimeoutMillis = toMillisecondTimeoutDelay(now, mNextMessageUptime);
        if (messageTimeoutMillis >= 0
                && (timeoutMillis < 0 || messageTimeoutMillis < timeoutMillis)) {
            timeoutMillis = messageTimeoutMillis;
        }
#if DEBUG_POLL_AND_WAKE
        ALOGD("%p ~ pollOnce - next message in %lldns, adjusted timeout: timeoutMillis=%d",
                this, mNextMessageUptime - now, timeoutMillis);
#endif
    }

    // Poll.
    int result = ALOOPER_POLL_WAKE;
    mResponses.clear();
    mResponseIndex = 0;

    struct epoll_event eventItems[EPOLL_MAX_EVENTS];
    
    //注意 1
    int eventCount = epoll_wait(mEpollFd, eventItems, EPOLL_MAX_EVENTS, timeoutMillis);

    // Acquire lock.
    mLock.lock();

// 注意 2
    // Check for poll error.
    if (eventCount < 0) {
        if (errno == EINTR) {
            goto Done;
        }
        ALOGW("Poll failed with an unexpected error, errno=%d", errno);
        result = ALOOPER_POLL_ERROR;
        goto Done;
    }

// 注意 3
    // Check for poll timeout.
    if (eventCount == 0) {
#if DEBUG_POLL_AND_WAKE
        ALOGD("%p ~ pollOnce - timeout", this);
#endif
        result = ALOOPER_POLL_TIMEOUT;
        goto Done;
    }

//注意 4
    // Handle all events.
#if DEBUG_POLL_AND_WAKE
    ALOGD("%p ~ pollOnce - handling events from %d fds", this, eventCount);
#endif

    for (int i = 0; i < eventCount; i++) {
        int fd = eventItems[i].data.fd;
        uint32_t epollEvents = eventItems[i].events;
        if (fd == mWakeReadPipeFd) {
            if (epollEvents & EPOLLIN) {
                awoken();
            } else {
                ALOGW("Ignoring unexpected epoll events 0x%x on wake read pipe.", epollEvents);
            }
        } else {
            ssize_t requestIndex = mRequests.indexOfKey(fd);
            if (requestIndex >= 0) {
                int events = 0;
                if (epollEvents & EPOLLIN) events |= ALOOPER_EVENT_INPUT;
                if (epollEvents & EPOLLOUT) events |= ALOOPER_EVENT_OUTPUT;
                if (epollEvents & EPOLLERR) events |= ALOOPER_EVENT_ERROR;
                if (epollEvents & EPOLLHUP) events |= ALOOPER_EVENT_HANGUP;
                pushResponse(events, mRequests.valueAt(requestIndex));
            } else {
                ALOGW("Ignoring unexpected epoll events 0x%x on fd %d that is "
                        "no longer registered.", epollEvents, fd);
            }
        }
    }
Done: ;

// 注意 5
    // Invoke pending message callbacks.
    mNextMessageUptime = LLONG_MAX;
    while (mMessageEnvelopes.size() != 0) {
        nsecs_t now = systemTime(SYSTEM_TIME_MONOTONIC);
        const MessageEnvelope& messageEnvelope = mMessageEnvelopes.itemAt(0);
        if (messageEnvelope.uptime <= now) {
            // Remove the envelope from the list.
            // We keep a strong reference to the handler until the call to handleMessage
            // finishes. Then we drop it so that the handler can be deleted *before*
            // we reacquire our lock.
            { // obtain handler
                sp<MessageHandler> handler = messageEnvelope.handler;
                Message message = messageEnvelope.message;
                mMessageEnvelopes.removeAt(0);
                mSendingMessage = true;
                mLock.unlock();

#if DEBUG_POLL_AND_WAKE || DEBUG_CALLBACKS
                ALOGD("%p ~ pollOnce - sending message: handler=%p, what=%d",
                        this, handler.get(), message.what);
#endif
                handler->handleMessage(message);
            } // release handler

            mLock.lock();
            mSendingMessage = false;
            result = ALOOPER_POLL_CALLBACK;
        } else {
            // The last message left at the head of the queue determines the next wakeup time.
            mNextMessageUptime = messageEnvelope.uptime;
            break;
        }
    }

    // Release lock.
    mLock.unlock();

//注意 6
    // Invoke all response callbacks.
    for (size_t i = 0; i < mResponses.size(); i++) {
        Response& response = mResponses.editItemAt(i);
        if (response.request.ident == ALOOPER_POLL_CALLBACK) {
            int fd = response.request.fd;
            int events = response.events;
            void* data = response.request.data;
#if DEBUG_POLL_AND_WAKE || DEBUG_CALLBACKS
            ALOGD("%p ~ pollOnce - invoking fd event callback %p: fd=%d, events=0x%x, data=%p",
                    this, response.request.callback.get(), fd, events, data);
#endif
            int callbackResult = response.request.callback->handleEvent(fd, events, data);
            if (callbackResult == 0) {
                removeFd(fd);
            }
            // Clear the callback reference in the response structure promptly because we
            // will not clear the response vector itself until the next poll.
            response.request.callback.clear();
            result = ALOOPER_POLL_CALLBACK;
        }
    }
    return result;
}
复制代码

上面标记了注意点

1 epoll机制，等待 mEpollFd 产生事件, 这个等待具备超时时间。
2，3，4 是等待的三种结果，goto 语句能够直接跳转到 标记 处
2 检测poll 是否出错，若是有，跳转到 Done
3 检测pool 是否超时，若是有，跳转到 Done
4 处理epoll后全部的事件
5 处理 pending 消息的回调
6 处理全部 Response的回调

而且咱们能够发现返回的结果有如下几种：

ALOOPER_POLL_CALLBACK

有 pending message 或者 request.ident 值为 ALOOPER_POLL_CALLBACK 的 Response被处理了。若是没有：

ALOOPER_POLL_WAKE 正常唤醒
ALOOPER_POLL_ERROR epoll错误
ALOOPER_POLL_TIMEOUT epoll超时

查找了一下枚举值:

ALOOPER_POLL_WAKE = -1,
ALOOPER_POLL_CALLBACK = -2,
ALOOPER_POLL_TIMEOUT = -3,
ALOOPER_POLL_ERROR = -4
复制代码

阶段性小结, 咱们对 消息 和 Native层的pollInner 进行了一次脑暴，引出了epoll机制。

其实Native层的 Looper分发还有很多值得脑暴的点，但咱们先缓缓，已经火烧眉毛的要对 epoll机制进行脑暴了。

##脑暴：Linux中的I/O模型

这部份内容，推荐一篇文章：使用 libevent 和 libev 提升网络应用性能——I/O模型演进变化史做者 hguisu

PS:本段中，存在部分图片直接引用自该文，我偷了个懒，没有去找原版内容并标记出处

阻塞I/O模型图：在调用recv()函数时，发生在内核中等待数据和复制数据的过程

实现很是的 简单，可是存在一个问题，阻塞致使线程没法执行其余任何计算，若是是在网络编程背景下，须要使用多线程提升处理并发的能力。

注意，不要用 Android中的 点击屏幕等硬件被触发事件 去对应这里的 网络并发，这是两码事。

若是采用了 多进程 或者 多线程 实现 并发应答，模型以下：

到这里，咱们看的都是 I/O 阻塞模型。

脑暴，阻塞为调用方法后一直在等待返回值，线程内执行的内容就像 卡顿 在这里。

若是要消除这种卡顿，那就不能调用方法等待I/O结果，而是要 当即返回 ！

举个例子：

去西装店定制西装，肯定好款式和尺寸后，你坐在店里一直等着，等到作好了拿给你，这就是阻塞型的，这能等死你；
去西装店定制西装，肯定好款式和尺寸后，店员告诉你别干等着，好多天呢，等你有空了来看看，这就是非阻塞型的。

改变为非阻塞模型后，应答模型以下：

不难理解，这种方式须要顾客去 轮询 。对客户不友好，可是对店家但是一点损失都没有，还让等候区没那么挤了。

有些西装店进行了改革，对客户更加友好了:

去西装店定制西装，肯定好款式和尺寸后，留下联系方式，等西服作好了联系客户，让他来取。

这就变成了 select or poll 模型：

注意：进行改革的西装店须要增长一个员工，图中标识的用户线程，他的工做是：

在前台记录客户订单和联系方式
拿记录着 订单 的小本子去找制做间，不断检查 订单是否完工，完工的就能够提走并联系客户了。

并且，他去看订单完工时，没法在前台记录客户信息，这意味他 阻塞 了，其余工做只能先搁置着。

这个作法，对于制做间而言，和 非阻塞模型 并无多大区别。还增长了一个店员，可是，用 一个店员 就解决了以前 不少店员 都会跑去 制做间 帮客户问"订单好了没有？" 的问题。

值得一提的是，为了提升服务质量，这个员工每次去制做间询问一个订单时，都须要记录一些信息：

订单完成度询问时，是否被应答；

应答有没有说谎；等

有些店对每种不一样的考核项均准备了记录册，这和 select模型相似

有些店只用一本记录册，可是册子上能够利用表格记录各类考核项，这和 poll 模型相似

select 模型和 poll 模型的近似度比较高。

没多久，老板就发现了，这个店员的工做效率有点低下，他每次都要拿着一本订单簿，去把订单都问一遍，倒不是员工不勤快，是这个模式有点问题。

因而老板又进行了改革：

在 前台 和 制做间 之间加一个送信管道。
制做间有进度须要汇报了，就送一份信到前台，信上写着订单号。
前台员工直接去问对应的订单。

这就变成了 epoll模型解决了 select/poll 模型的遍历效率问题。

这样改革后，前台员工就再也不须要按着订单簿从上到下挨个问了。提升了效率，前台员工只要无事发生，就能够优雅的划水了。

咱们看一下NativeLooper的构造函数：

Looper::Looper(bool allowNonCallbacks) :
        mAllowNonCallbacks(allowNonCallbacks), mSendingMessage(false),
        mResponseIndex(0), mNextMessageUptime(LLONG_MAX) {
    int wakeFds[2];
    int result = pipe(wakeFds);
    LOG_ALWAYS_FATAL_IF(result != 0, "Could not create wake pipe. errno=%d", errno);

    mWakeReadPipeFd = wakeFds[0];
    mWakeWritePipeFd = wakeFds[1];

    result = fcntl(mWakeReadPipeFd, F_SETFL, O_NONBLOCK);
    LOG_ALWAYS_FATAL_IF(result != 0, "Could not make wake read pipe non-blocking. errno=%d",
            errno);

    result = fcntl(mWakeWritePipeFd, F_SETFL, O_NONBLOCK);
    LOG_ALWAYS_FATAL_IF(result != 0, "Could not make wake write pipe non-blocking. errno=%d",
            errno);

    // Allocate the epoll instance and register the wake pipe.
    mEpollFd = epoll_create(EPOLL_SIZE_HINT);
    LOG_ALWAYS_FATAL_IF(mEpollFd < 0, "Could not create epoll instance. errno=%d", errno);

    struct epoll_event eventItem;
    memset(& eventItem, 0, sizeof(epoll_event)); // zero out unused members of data field union
    eventItem.events = EPOLLIN;
    eventItem.data.fd = mWakeReadPipeFd;
    result = epoll_ctl(mEpollFd, EPOLL_CTL_ADD, mWakeReadPipeFd, & eventItem);
    LOG_ALWAYS_FATAL_IF(result != 0, "Could not add wake read pipe to epoll instance. errno=%d",
            errno);
}
复制代码

总结

相信看到这里，你们已经本身悟透了各类问题。按照惯例，仍是要总结下，由于 这篇是脑暴，因此 思绪 是比较 跳跃 的，内容先后关系不太明显。

咱们结合一个问题来点明内容先后关系。

Java层 Looper和MQ 会什么使用了死循环可是 不会"阻塞"UI线程 / 没形成ANR / 依旧能够响应点击事件

Android是基于 事件驱动 的，并创建了 完善的 消息机制
Java层的消息机制只是一个局部，其负责的就是面向消息队列，处理 消息队列管理，消息分发，消息处理
Looper的死循环保障了 消息队列 的 消息分发 一直处于有效运行中，不循环就中止了分发。
MessageQueue的 死循环 保障了 Looper能够获取有效的消息，保障了Looper 只要有消息，就一直运行，发现有效消息，就跳出了死循环。
并且Java层MessageQueue在 next() 方法中的死循环中，经过JNI调用了 Native层MQ的 pollOnce，驱动了Native层去处理Native层消息
值得一提的是，UI线程处理的事情也都是基于消息的，不管是更新UI仍是响应点击事件等。

因此，正是Looper 进行loop()以后的死循环，保障了UI线程的各项工做正常执行。

再说的ANR，这是Android 确认主线程 消息机制 正常 且 健康 运转的一种检测机制。

由于主线程Looper须要利用 消息机制 驱动UI渲染和交互事件处理，若是某个消息的执行，或者其衍生出的业务，在主线程占用了大量的时间，致使主线程长期阻塞，会影响用户体验。

因此ANR检测采用了一种 埋定时炸弹 的机制，必须依靠Looper的高效运转来消除以前装的定时炸弹。而这种定时炸弹比较有意思，被发现了才会炸。

在说到 响应点击事件，相似的事件老是从硬件出发的，在到内核，再进程间通讯到用户空间，这些事件以消息的形式存在于Native层，通过处理后，表现出：

ViewRootImpl收到了InputManager的输入，并进行了事件处理

这里咱们借用一张图总结整个消息机制流程：

图片来自《Android7.0 MessageQueue详解》做者 Gaugamela

PS:这篇文章写得很长，内容长，耗时也长，大约花费了10天的时间，其中还有很多内容写得未能尽兴。例如： "Java层在哪些状况下利用JNI调取Native层的唤醒，为何这么干？"等等。

可是考虑到篇幅，决定再也不往下挖了。