Redis 内存分配分析

时间 2019-11-08

标签 redis 内存分配分析栏目 Redis 繁體版

原文原文链接

为何要分析

以前业务反应，数据导入到Redis 中，内存是原来文件占用的几倍。因此这里来介绍一下Redis是如何分配内存的。而且在咱们平常去评估一个新上线的业务redis内存使用也是很是有帮助的。redis

须要了解的

这里以简单的Redis String数据类型做为例子,其余数据类型能够做为参考，只要不是采用压缩数据类型存储的。全文会介绍到涉及到内存分配的地方。而且会以此来计算Redis使用的内存，最终与Redis info 中统计的内存使用进行比较。服务器

在介绍以前须要简单介绍一下Redis中是如何存储Key以及Value的。数据结构

其实在Redis中，并非单纯将key 与value保存到内存中就能够的。它须要依赖一些结构对其进行管理。ide

如上图所示，在Redis中，一个DB对应上面绿色的一个 dict结构体：工具

typedef struct dict {
    dictType *type;
    void *privdata;
    dictht ht[2];                                                                                                                                                                                                                           
    long rehashidx; /* rehashing not in progress if rehashidx == -1 */
    int iterators; /* number of iterators currently running */
} dict;

该结构体包含两个dictht结构体，dictht结构体以下：ui

typedef struct dictht {
    dictEntry **table;
    unsigned long size;
    unsigned long sizemask;
    unsigned long used;
} dictht;

dictht结构体中又包含指向多个dictEntry 结构体的指针，dictEntry结构体以下：this

typedef struct dictEntry {
    void *key;             
    union {               
        void *val;         
        uint64_t u64;     
        int64_t s64;       
        double d;         
    } v;                   
    struct dictEntry *next;
} dictEntry;

因此最终key及value是存储在dictEntry中（准确说是key和val指向对应的key及value对象）。编码

开始计算

这里Redis为何要这么设计就不重点介绍了，这里重点讨论在Redis存储一个键值对（key/value）的时候，这些结构体中涉及到须要分配内存的地方。lua

咱们先看在咱们执行一条 set jingbo test 命令的时候，Redis是怎么分配内存的。spa

在Redis 服务器端接收到 set jingbo test这条命令的时候，会在processMultibulkBuffer 方法中调用createStringObject方法分别为set/jingbo/test 建立三个字符串对象。
建立字符串对象的时候又区分是不是EMBSTR 编码，这里就不讨论了。由于不论是否是采用EMBSTR编码，所占的内存是没有变化的，只是影响效率。
因为这里 set/jingbo/test 字符都没有超过39个，因此Redis会采用EMBSTR编码，那么建立对象方法以下：

/* Create a string object with encoding REDIS_ENCODING_EMBSTR, that is
 * an object where the sds string is actually an unmodifiable string                                                                                                                                                                       
 * allocated in the same chunk as the object itself. */
robj *createEmbeddedStringObject(char *ptr, size_t len) {
    robj *o = zmalloc(sizeof(robj)+sizeof(struct sdshdr)+len+1);
    struct sdshdr *sh = (void*)(o+1);
    o->type = REDIS_STRING;
    o->encoding = REDIS_ENCODING_EMBSTR;
    o->ptr = sh+1;
    o->refcount = 1;
    o->lru = LRU_CLOCK();
 
    sh->len = len;
    sh->free = 0;
    if (ptr) {
        memcpy(sh->buf,ptr,len);
        sh->buf[len] = '\0';
    } else {
        memset(sh->buf,0,len+1);
    }   
    return o;
}

那么咱们能够看到在第一行中，Redis为其分配了sizeof(robj)+sizeof(struct sdshdr)+len+1这么大的内存。
这里计算 sizeof（robj）= 16 +sizeof(struct sdshdr) = 8 + len（字符串自己长度） + 1
因此jingbo这个字符串在这里就须要16+8+6+1=31b，可是Redis采用的内存分配器实际为其分配32b，同理test这个字符串内存分配器为其分配32b

因为是set命令，接着就到dbAdd方法下，dbAdd方法以下：

/* Add the key to the DB. It's up to the caller to increment the reference
 * counter of the value if needed.
 *
 * The program is aborted if the key already exists. */
void dbAdd(redisDb *db, robj *key, robj *val) {                                                                                                                                                                                             
    sds copy = sdsdup(key->ptr);
    int retval = dictAdd(db->dict, copy, val);
 
    redisAssertWithInfo(NULL,key,retval == REDIS_OK);
    if (val->type == REDIS_LIST) signalListAsReady(db, key);
    if (server.cluster_enabled) slotToKeyAdd(key);
 }

这里的robj key, robj val 传入的对象就是刚刚Redis建立的字符串对象。咱们能够看到在方法的第一行，其实最终建立了一个sds字符串对象，就是调用如下方法：

sds sdsnewlen(const void *init, size_t initlen) {
    struct sdshdr *sh;
 
    if (init) {
        sh = zmalloc(sizeof(struct sdshdr)+initlen+1);
    } else {
        sh = zcalloc(sizeof(struct sdshdr)+initlen+1);
    }
    if (sh == NULL) return NULL;
    sh->len = initlen;
    sh->free = 0;
    if (initlen && init)
        memcpy(sh->buf, init, initlen);
    sh->buf[initlen] = '\0';
    return (char*)sh->buf;
}

因此这里须要的内存大小就是 sizeof(struct sdshdr)+initlen+1，即 sizeof(struct sdshdr) = 8 + initlen（字符串自己长度） + 1 ，那么key在这里就须要8+6+1=15b，内存分配器实际分配16b
因此其实最终Redis 的 key存储是用的上面的建立的这个SDS对象。value 就是以前建立的字符串对象。最后dictEntry结构体中的key和 value会分别指向key和value对象，那以前建立的字符串对象会在客户端释放或者其余状况下进行释放。
这里还有一个涉及到内存分配的地方就是为 dictEntry结构体分配内存，dictEntry结构体须要24b，Redis内存分配器实际为其分配32b。

那么目前为止Redis分配的内存为 16b（jingbo）+ 32b（test） + 32b（dictEntry 结构体） = 78b

验证结果

那实际状况是不是这样呢？

咱们这里采用Redis 官方自带压测工具benchmark压测。

/usr/local/rc-redis-3.0.7/src/redis-benchmark -p 6379 -t set -n 1000000 -r 1000000

压测结果：

[jingbo8@poseidon54 ~]$ redis-cli -p 6379 info|grep keys
expired_keys:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
db0:keys=632147,expires=0,avg_ttl=0
[jingbo8@poseidon54 ~]$ redis-cli -p 6379 info|grep mem
used_memory:69423144
used_memory_human:66.21M
used_memory_rss:72884224
used_memory_peak:72148656
used_memory_peak_human:68.81M
used_memory_lua:36864
mem_fragmentation_ratio:1.05
mem_allocator:jemalloc-3.6.0

咱们能够看到共有632147 个key，占用66.21M内存。benchmark全部的key以下所示：

set key:000000166802 xxx

key 为 16个字符，value 为三个字符，那么key须要 8+16+1=25b，实际分配 32b，value须要 16+8+3+1=28b，实际分配 32b，dictEntry结构体须要24b，实际分配32b

因此这里单个key须要 32b（key）+32b（value）+32b（dictEntry结构体）=96b

那么总共内存是：

96b*632147=60686112
60686112/1024/1024 = 57.87MB

咱们能够看到这里离66.21M还差一些内存。这里咱们并无去考虑Redis在初始为一些元数据结构分配的内存（好比建立的共享对象等），咱们离实际使用的内存还差66.21-57.87=8.34MB

那其实咱们少算了dictht结构体所占用的内存。上图中的ht[0]和ht[1]为两个dictht结构体。ht[1]主要是为了在ht[0]须要扩容的时候使用。日常不占用内存，这里主要看ht[0]中占用的内存。

/* Expand or create the hash table */
int dictExpand(dict *d, unsigned long size)
{
    dictht n; /* the new hash table */
    unsigned long realsize = _dictNextPower(size);
 
    /* the size is invalid if it is smaller than the number of
     * elements already inside the hash table */
    if (dictIsRehashing(d) || d->ht[0].used > size)
        return DICT_ERR;
 
    /* Rehashing to the same table size is not useful. */
    if (realsize == d->ht[0].size) return DICT_ERR;
 
    /* Allocate the new hash table and initialize all pointers to NULL */
    n.size = realsize;
    n.sizemask = realsize-1;
    n.table = zcalloc(realsize*sizeof(dictEntry*));
    n.used = 0;
 
    /* Is this the first initialization? If so it's not really a rehashing
     * we just set the first hash table so that it can accept keys. */
    if (d->ht[0].table == NULL) {
        d->ht[0] = n;
        return DICT_OK;
    }   
 
    /* Prepare a second hash table for incremental rehashing */
    d->ht[1] = n;
    d->rehashidx = 0;
    return DICT_OK;
}

在系统最开始初始化的时候会初始化ht[0],而且为realsize 分配为4，那么这里分配的内存是 realsizesizeof(dictEntry)=4*8 = 32b，这只是最开始的时候，而且只能存储4个dictEntry指针，也就是对应4个key，在key增加的时候，会对ht[0]进行扩容，这时候会先将ht[1]扩容至ht[0]的两倍，而后将ht[0]中对应的dictEntry所有迁移到ht[1]，而后他们再相互交换一下。那么ht[1]又变成ht[0]了，以前的ht[0]变为ht[1]而且释放内存。因此在每次ht[0]满了以后都会扩容至之前的2倍。

那目前咱们key的数量是632147，那么realsize是多少呢

realsize=4
realsize=8
realsize=16
realsize=32
realsize=64
realsize=128
realsize=256
realsize=512
realsize=1024
realsize=2048
realsize=4096
realsize=8192
realsize=16384
realsize=32768
realsize=65536
realsize=131072
realsize=262144
realsize=524288
realsize=1048576

524288 <632147 <1048576，因此目前realsize是1048576，那么总共须要分配的内存就是1048576*8= 8388608，8388608/1024/1024=8MB

那么以前8.34-8=0.34 MB，因此目前咱们只相差0.34MB。那么咱们把全部key清空看一下Redis自己使用了多少内存呢。

[jingbo8@poseidon54 ~]$ redis-cli -p 6379 info|grep mem                                                             
used_memory:349256
used_memory_human:341.07K
used_memory_rss:2015232
used_memory_peak:349256
used_memory_peak_human:341.07K
used_memory_lua:36864
mem_fragmentation_ratio:5.77
mem_allocator:jemalloc-3.6.0

0.34*1024=348.16k，这样与目前的内存很是接近了。

那么从上面这个分析结果来看，Redis 自己的结构体占据了大部分的内存。因此最终形成导入到Redis中的内存与以前文件中所占用的空间差距较大，内存是宝贵的资源，因此你们在往Redis存储数据的时候必定要设计好。

至此整个内存如何分配分析完成，有问题能够随时联系。