Linux下的内核模块机制

时间 2019-11-20

标签 linux 内核模块机制栏目 Linux 繁體版

原文原文链接

2017-06-20html

Linux的内核模块机制容许开发者动态的向内核添加功能，咱们常见的文件系统、驱动程序等均可以经过模块的方式添加到内核而无需对内核从新编译，这在很大程度上减小了操做的复杂度。模块机制使内核预编译时没必要包含不少无关功能，把内核作到最精简，后期能够根据须要进行添加。而针对驱动程序，由于涉及到具体的硬件，很难使通用的，且其中可能包含了各个厂商的私密接口，厂商几乎不会容许开发者把源代码公开，这就和linux的许可相悖，模块机制很好的解决了这个冲突，容许驱动程序后期进行添加而不合并到内核。OK，下面结合源代码讨论下模块机制的实现。linux

相似于普通的可执行文件，模块通过编译后获得.ko文件，其自己也是可重定位目标文件，相似于gcc -c 获得的.o目标文件。对于可重定位的概念，请参考PE文件格式（虽然是windows下的，可是原理相似）。算法

既然是重定位文件，在把模块加载到内核的时候就须要进行重定位，回想下用户可执行文件的重定位，通常若是一个程序的可执行文件总能加载到本身的理想位置，因此对于用户可执行文件，通常不怎么须要重定位，而对于动态库文件就不一样了，库文件格式是一致的，可是可能须要加载多个库文件，那么有些库文件必然没法加载到本身的理想位置，就须要进行重定位。而内核模块因为和内核共享同一个内核地址空间，更不能保证本身的理想地址不被占用，因此通常状况内核模块也须要进行重定位。在加载到内核时，还有一个重要的工做即便解决模块之间的依赖，模块A中引用了其余模块的函数，那么在加载到内核以前其实模块A并不知道所引用的函数地址，所以只能作一个标记，在加载到内核的时候在根据符号表解决引用问题！这些都是在加载内核的核心系统调用sys_init_module完成。windows

内核中的数据结构数组

每个内核模块在内核中都对应一个数据结构module,全部的模块经过一个链表维护。因此有些恶意模块企图经过从链表摘除结构来达到隐藏模块的目的。部分红员列举以下：数据结构

struct module
{
    enum module_state state;

    /* Member of list of modules */
    struct list_head list;//全部的模块构成双链表，包头为全局变量modules

    /* Unique handle for this module */
    char name[MODULE_NAME_LEN];//模块名字，惟一，通常存储去掉.ko的部分

    /* Sysfs stuff. */
    struct module_kobject mkobj;
    struct module_attribute *modinfo_attrs;
    const char *version;
    const char *srcversion;
    struct kobject *holders_dir;

    /* Exported symbols *//**/
    const struct kernel_symbol *syms;//导出符号信息，指向一个kernel_symbol的数组，有num_syms个表项。
    const unsigned long *crcs;//一样有num_syms个表项，不过存储的是符号的校验和
    unsigned int num_syms;

    /* Kernel parameters. */
    struct kernel_param *kp;
    unsigned int num_kp;

    /* GPL-only exported symbols. */
    unsigned int num_gpl_syms;//具体意义同上面符号，可是这里只适用于GPL兼容的模块
    const struct kernel_symbol *gpl_syms;
    const unsigned long *gpl_crcs;

#ifdef CONFIG_UNUSED_SYMBOLS
    /* unused exported symbols. */
    const struct kernel_symbol *unused_syms;
    const unsigned long *unused_crcs;
    unsigned int num_unused_syms;

    /* GPL-only, unused exported symbols. */
    unsigned int num_unused_gpl_syms;
    const struct kernel_symbol *unused_gpl_syms;
    const unsigned long *unused_gpl_crcs;
#endif

#ifdef CONFIG_MODULE_SIG
    /* Signature was verified. */
    bool sig_ok;
#endif

    /* symbols that will be GPL-only in the near future. */
    const struct kernel_symbol *gpl_future_syms;
    const unsigned long *gpl_future_crcs;
    unsigned int num_gpl_future_syms;

    /* Exception table */
    unsigned int num_exentries;
    struct exception_table_entry *extable;

    /* Startup function. */
    int (*init)(void);//模块初始化函数指针

    /* If this is non-NULL, vfree after init() returns */
    void *module_init;/若是该函数不为空，则init结束后就能够调用进行适当释放

    /* Here is the actual code + data, vfree'd on unload. */
    void *module_core;//核心数据和代码部分，在卸载的时候会调用

    /* Here are the sizes of the init and core sections */
    unsigned int init_size, core_size;//对应于上面的init和core函数，决定各自占用的大小

    /* The size of the executable code in each section.  */
    unsigned int init_text_size, core_text_size;

    /* Size of RO sections of the module (text+rodata) */
    unsigned int init_ro_size, core_ro_size;
    。。。。。。

#ifdef CONFIG_MODULE_UNLOAD
　　　　/*模块间的依赖关系记录*/
    /* What modules depend on me? */
    struct list_head source_list;
    /* What modules do I depend on? */
    struct list_head target_list;

    /* Who is waiting for us to be unloaded */
    struct task_struct *waiter;//等待队列，记录那些进程等待模块被卸载

    /* Destruction function. */
    void (*exit)(void);//卸载退出函数，模块中定义的exit函数

    。。。。。。
};

依赖关系架构

模块间的依赖关系经过两个节点source_list和target_list记录，前者记录那些模块依赖于本模块，后者记录本模块依赖于那些模块。节点经过module_use记录，module_use以下函数

struct module_use {
    struct list_head source_list;
    struct list_head target_list;
    struct module *source, *target;
};

每一个module_use记录一个映射关系，注意这里把source和target放在一个一个结构里，由于一个关系须要在源模块和目标模块都作记录。若是模块A依赖于模块B，则生成一个module_use结构，其中source_list字段链入模块B的module结构的source_list链表，而source指针指向模块A的module结构。而target_list加入到模块A中的target_list链表，target指针指向模块B的模块结构，参考下面代码。oop

static int add_module_usage(struct module *a, struct module *b)
{
    struct module_use *use;

    pr_debug("Allocating new usage for %s.\n", a->name);
    use = kmalloc(sizeof(*use), GFP_ATOMIC);
    if (!use) {
        printk(KERN_WARNING "%s: out of memory loading\n", a->name);
        return -ENOMEM;
    }

    use->source = a;
    use->target = b;
    list_add(&use->source_list, &b->source_list);
    list_add(&use->target_list, &a->target_list);
    return 0;
}

符号信息this

内核模块几乎不会做为彻底独立的存在，均须要引用其余模块的函数，而这一机制就是由符号机制保证的。参考前面的module数据结构，在

const struct kernel_symbol *syms;//导出符号信息，指向一个kernel_symbol的数组，有num_syms个表项。 

const unsigned long *crcs;//一样有num_syms个表项，不过存储的是符号的校验和 

unsigned int num_syms;

syms指针指向一个符号数组，也能够称之为符号表，不过是局部的符号表。看下kernel_symbol结构

struct kernel_symbol
{
    unsigned long value;
    const char *name;
};

结构很简单，value记录符号地址，而name天然就是符号名字了。原理很简单，借助于find_symbol函数看下内核若是解决位引用的符号

const struct kernel_symbol *find_symbol(const char *name,
                    struct module **owner,
                    const unsigned long **crc,
                    bool gplok,
                    bool warn)
{
    struct find_symbol_arg fsa;

    fsa.name = name;
    fsa.gplok = gplok;
    fsa.warn = warn;

    if (each_symbol_section(find_symbol_in_section, &fsa)) {
        if (owner)
            *owner = fsa.owner;
        if (crc)
            *crc = fsa.crc;
        return fsa.sym;
    }

    pr_debug("Failed to find symbol %s\n", name);
    return NULL;
}

首先把参数信息封装成一个find_symbol_arg结构，而后调用了each_symbol_section，并传入了在section中查找symbol的函数find_symbol_in_section

bool each_symbol_section(bool (*fn)(const struct symsearch *arr,
                    struct module *owner,
                    void *data),
             void *data)
{
    struct module *mod;
    static const struct symsearch arr[] = {
        { __start___ksymtab, __stop___ksymtab, __start___kcrctab,
          NOT_GPL_ONLY, false },
        { __start___ksymtab_gpl, __stop___ksymtab_gpl,
          __start___kcrctab_gpl,
          GPL_ONLY, false },
        { __start___ksymtab_gpl_future, __stop___ksymtab_gpl_future,
          __start___kcrctab_gpl_future,
          WILL_BE_GPL_ONLY, false },
#ifdef CONFIG_UNUSED_SYMBOLS
        { __start___ksymtab_unused, __stop___ksymtab_unused,
          __start___kcrctab_unused,
          NOT_GPL_ONLY, true },
        { __start___ksymtab_unused_gpl, __stop___ksymtab_unused_gpl,
          __start___kcrctab_unused_gpl,
          GPL_ONLY, true },
#endif
    };

    if (each_symbol_in_section(arr, ARRAY_SIZE(arr), NULL, fn, data))
        return true;

    list_for_each_entry_rcu(mod, &modules, list) {
        struct symsearch arr[] = {
            { mod->syms, mod->syms + mod->num_syms, mod->crcs,
              NOT_GPL_ONLY, false },
            { mod->gpl_syms, mod->gpl_syms + mod->num_gpl_syms,
              mod->gpl_crcs,
              GPL_ONLY, false },
            { mod->gpl_future_syms,
              mod->gpl_future_syms + mod->num_gpl_future_syms,
              mod->gpl_future_crcs,
              WILL_BE_GPL_ONLY, false },
#ifdef CONFIG_UNUSED_SYMBOLS
            { mod->unused_syms,
              mod->unused_syms + mod->num_unused_syms,
              mod->unused_crcs,
              NOT_GPL_ONLY, true },
            { mod->unused_gpl_syms,
              mod->unused_gpl_syms + mod->num_unused_gpl_syms,
              mod->unused_gpl_crcs,
              GPL_ONLY, true },
#endif
        };

        if (mod->state == MODULE_STATE_UNFORMED)
            continue;

        if (each_symbol_in_section(arr, ARRAY_SIZE(arr), mod, fn, data))
            return true;
    }
    return false;
}

首先考虑的天然是内核自身的符号，根据优先顺序，定义了一个数组，内核中的导出符号记录在全局的结构中，顺序分别是__start___ksymtab、__start___ksymtab_gpl、__start___ksymtab_gpl_future、__start___ksymtab_unused、__start___ksymtab_unused_gpl。而后调用each_symbol_in_section进行遍历数组，针对每个项，调用find_symbol_in_section进行查找。若是内核中的符号没有包含指定符号，则须要查找其余加载模块的符号表，这就是局部符号表，方法相似，不过是表指针记录在module结构中而不是全局的。不在赘述。看下find_symbol_in_section

static bool find_symbol_in_section(const struct symsearch *syms,
                   struct module *owner,
                   void *data)
{
    struct find_symbol_arg *fsa = data;
    struct kernel_symbol *sym;

    sym = bsearch(fsa->name, syms->start, syms->stop - syms->start,
            sizeof(struct kernel_symbol), cmp_name);

    if (sym != NULL && check_symbol(syms, owner, sym - syms->start, data))
        return true;

    return false;
}

该函数是根据是个符号表的起始和结束区间对符号进行查找，具体查找工做有bsearch完成，经过二分查找key,即符号名字。算法挺简单，咱们也看下

void *bsearch(const void *key, const void *base, size_t num, size_t size,
          int (*cmp)(const void *key, const void *elt))
{
    size_t start = 0, end = num;
    int result;

    while (start < end) {
        size_t mid = start + (end - start) / 2;

        result = cmp(key, base + mid * size);
        if (result < 0)
            end = mid;
        else if (result > 0)
            start = mid + 1;
        else
            return (void *)base + mid * size;
    }

    return NULL;
}

找到一个结果就调用cmp进行比较，cmp为开始传递进来的比较函数，本质仍是调用strcmp函数。有这里能够看出，符号表种符号是有顺序的，即经过首字母进行排列，首字母相同则按照第二个字母，以此类推。这样在找到symbol后会对其进行校验，若是没有找到就直接返回false了……

使用未导出的函数

一、定义函数指针

二、声明函数

三、查找符号表

之内核中的lookup_swap_cache函数为例，函数在内核中未导出，不能直接使用。经过查找符号表，把函数地址强制转化成函数指针，能够为咱们所用。

一、定义函数指针

函数原始定义以下：struct page *lookup_swap_cache(swp_entry_t swp)

typedef struct page* (*LOOKUPSWAPCACHE)(swp_entry_t);

二、声明函数

LOOKUPSWAPCACHE lookup_swap_cache_chen;

三、赋值

lookup_swap_cache_chen=(LOOKUPSWAPCACHE)kallsyms_lookup_name("lookup_swap_ cache");

这样就能够在本身的模块中使用了lookup_swap_cache_chen函数了。须要包含头文件#include <linux/kallsyms.h>

用户空间信息

一下信息摘自：http://www.blogbus.com/wanderer-zjhit-logs/172382425.html

内核符号表（kernel symbol table）变量名或者函数名组成，每一项是符号和地址的序对，就像域名和ip地址，格式以下：

[root@rx6600 boot]# head System.map
000000000479c4a0 A phys_start
a000000000000600 A __start_gate_mckinley_e9_patchlist
a000000000000604 A __end_gate_mckinley_e9_patchlist
a000000000000604 A __end_gate_vtop_patchlist
a000000000000604 A __start_gate_fsyscall_patchlist
a000000000000604 A __start_gate_vtop_patchlist
对于系统的oop消息、或者经过gdb的调试消息，都须要根据该对照表，将内核熟悉的函数地址转化为用户熟悉的函数名称，便于用户进行故障定位、运行监控。
内核符号表存储位置
System.map
磁盘中真实存在的文件，存储内核中静态编译的函数和变量地址，每一个新编译内核对应一个System.map文件，当klogd输出内核消息时，会经过/boot/System.map来将函数、变量地址转换为名称，方便用户理解。该文件对应不一样的编译内核有对应的实现文件。
/proc/kallsyms
内核启动时候建立,共oops时定位错误，文件大小总为0，包含当前内核导出的、可供使用的变量或者函数
类似点：都是内核函数、变量的符号表，结构一致；对于可导出的内核变量、函数，其运行时在物理内存中的位置是同样的。
区别：
二者侧重点不一样，System.map文件面向内核，对于内核中的没有导出的变量或者函数名，好比kthread_create_list链表头指针，也有其相应的内核地址，该文件通常是只读的、固定大小的，没有动态添加模块中的变量、函数名；而System.map在内核启动过程当中建立，并实时更新，反映的是系统的当前最新状况，其内部也包含内核或者是已加载模块导出的函数、变量名称。因此和System.map文件有差异，且文件动态变化，大小不固定。

注：/proc/kmsg文件保存了内核从最开始启动到正常运行时的全部内核输出消息，是内核在运行过程当中经过printk输出的。
若是klogd启动，klogd读取/proc/kmsg文件的内容，而后经过syslogd程序，写到/var/log/messages文件中，固然，syslogd能够经过syslogd.conf文件进行配置。利用dmesg，其实也是读取/proc/kmsg文件内容，而后显示到终端。
dmesg和klogd都是利用了System.map文件将内核地址转化为对应的函数名称，方便用户调试。
在内核运行出现问题时，通常因为引用了一个无效指针形成的oops错误，若是在应用层，通常应用程序不可能从段错误（即引用无效地址）中恢复，可是因为内核稳定性比较高，通常只是会将该内核模块杀死，并使系统维持在一个稳定状态；若是出现更严重状况，即内核出现panic，就会宕机重启。

暂时写到这里把……

以马内利

参考资料：

一、linux内核3.10.1源码

二、深刻linux内核架构