Redis Hash哈希（2）

时间 2020-03-12

标签 redis hash 哈希栏目 Redis 繁體版

原文原文链接

存储类型

包含键值对的无序散列表。value只能是字符串，不能嵌套其余类型。redis

一样是存储字符串，Hash与String的主要区别？

一、把全部相关的值汇集到一个key中，节省内存空间数组

二、只使用一个key，减小key冲突数据结构

三、当须要批量获取值的时候，只须要使用一个命令，减小内存/IO/CPU的消耗ide

Hash不适合的场景：

一、Field不能单独设置过时时间性能

二、没有bit操做ui

三、须要考虑数据量分布的问题（value值很是大的时候，没法分布到多个节点）this

存储（实现）原理

Redis的Hash自己也是一个KV的结构，相似于Java中的HashMap。编码

外层的哈希（RedisKV的实现）只用到了hashtable。当存储hash数据类型时，咱们把它叫作内层的哈希。内层的哈希底层可使用两种数据结构实现：指针

ziplist：OBJ_ENCODING_ZIPLIST（压缩列表）code

hashtable：OBJ_ENCODING_HT（哈希表）

执行命令

ziplist压缩列表

ziplist是一个通过特殊编码的双向链表，它不存储指向上一个链表节点和指向下一个链表节点的指针，而是存储上一个节点长度和当前节点长度，经过牺牲部分读写性能，来换取高效的内存空间利用率，是一种时间换空间的思想。只用在字段个数少，字段值小的场景里面。

ziplist的内部结构

ziplist.c源码第16行的注释：

typedef struct zlentry {
    unsigned int prevrawlensize; /* 上一个链表节点占用长度*/
    unsigned int prevrawlen;     /* 上一个链表节点的长度数值所需的字节数 */
    unsigned int lensize;        /* 当前链表节点长度数值所需字节数 */
    unsigned int len;            /* 当前链表节点占用的长度 */
    unsigned int headersize;     /* 当前链表节点的头部大小（prevrawlensize + lensize），即非数据域大小 */
    unsigned char encoding;      /* 编码方式*/
    unsigned char *p;            /*压缩链表以字符串的形式保存，该指针指向当前节点起始位置 */
} zlentry;

编码encoding（ziplist.c源码第204行）
	
#define ZIP_STR_06B (0 << 6)
#define ZIP_STR_14B (1 << 6)
#define ZIP_STR_32B (2 << 6)

何时使用ziplist存储？

当hash对象同时知足如下两个条件的时候，使用ziplist编码：

一、全部的键值对的健和值的字符串长度都小于等于64byte（一个英文字母一个字节）

二、哈希对象保存的键值对数量小于512个。

/*redis.conf配置*/
hash-max-ziplist-value 64     //ziplist中最大能存放的值长度
hash-max-ziplist-entries 512  //ziplist中最多能存放的entry节点数量

一个哈希对象超过配置的阈值（键和值的长度有>64byte，键值对个数>512个）时，会转换成哈希表（hashtable）。

hashtable（源码位置：dict.h ）

在Redis中，hashtable被称为字典（dictionary），它是一个数组+链表的结构。

前面咱们知道了，Redis的KV结构是经过一个dictEntry来实现的。

Redis又对dictEntry进行了多层的封装。

typedef struct dictEntry {
    void *key;         /*Key关键字定义*/
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;
} dictEntry;

dictEntry放到了dictht（hashtable里面）

/* This is our hash table structure. Every dictionary has two of this as we
 * implement incremental rehashing, for the old to the new table. */
typedef struct dictht {
    dictEntry **table;     /*哈希表数组*/
    unsigned long size;    /*哈希表数组*/
    unsigned long sizemask;/*掩码大小，用于计算索引值。等于size-1*/
    unsigned long used;    /*已有节点数*/
} dictht;

ht放到了dict里面

typedef struct dict {
    dictType *type; /*字段类型*/
    void *privdata; /*私有数据*/
    dictht ht[2];   /*一个字段有两个哈希表*/
    long rehashidx; /* rehash索引  */
    unsigned long iterators; /* 当前正在使用的迭代器数量 */
} dict;

从最底层到最高层dictEntry——dictht——dict——OBJ_ENCODING_HT

哈希的存储结构

为何要定义两个哈希表呢？ht[2]

redis的hash默认使用的是ht[0]，ht[1]不会初始化和分配空间。

哈希表dictht是用链地址法来解决碰撞问题的。在这种状况下，哈希表的性能取决于它的大小（size属性）和它所保存的节点的数量（used属性）之间的比率：

比率在1:1时（一个哈希表ht只存储一个节点entry），哈希表的性能最好；
若是节点数量比哈希表的大小要大不少的话（这个比例用ratio表示，5表示平均一个ht存储5个entry），那么哈希表就会退化成多个链表，哈希表自己的性能优点就再也不存在。

在这种状况下须要扩容。Redis里面的这种操做叫作rehash。

rehash的步骤：

一、为字符ht[1]哈希表分配空间，这个哈希表的空间大小取决于要执行的操做，以及ht[0]当前包含的键值对的数量。

扩展：ht[1]的大小为第一个大于等于ht[0].used*2。

二、将全部的ht[0]上的节点rehash到ht[1]上，从新计算hash值和索引，而后放入指定的位置。

三、当ht[0]所有迁移到了ht[1]以后，释放ht[0]的空间，将ht[1]设置为ht[0]表，并建立新的ht[1]，为下次rehash作准备。

何时触发扩容？

负载因子（源码位置：dict.c）

static int dict_can_resize = 1;
static unsigned int dict_force_resize_ratio = 5;

ratio=used/size，已使用节点与字典大小的比例dict_can_resize为1而且

dict_force_resize_ratio已使用节点数和字典大小之间的比率超过1：5，触发扩容

扩容判断 _dictExpandIfNeeded（源码dict.c）

/* Expand the hash table if needed */
static int _dictExpandIfNeeded(dict *d)
{
    /* Incremental rehashing already in progress. Return. */
    if (dictIsRehashing(d)) return DICT_OK;

    /* If the hash table is empty expand it to the initial size. */
    if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);

    /* If we reached the 1:1 ratio, and we are allowed to resize the hash
     * table (global setting) or we should avoid it but the ratio between
     * elements/buckets is over the "safe" threshold, we resize doubling
     * the number of buckets. */
    if (d->ht[0].used >= d->ht[0].size &&
        (dict_can_resize ||
         d->ht[0].used/d->ht[0].size > dict_force_resize_ratio))
    {
        return dictExpand(d, d->ht[0].used*2);
    }
    return DICT_OK;
}

扩容方法dictExpand（源码dict.c）

/* Expand or create the hash table */
int dictExpand(dict *d, unsigned long size)
{
    /* the size is invalid if it is smaller than the number of
     * elements already inside the hash table */
    if (dictIsRehashing(d) || d->ht[0].used > size)
        return DICT_ERR;

    dictht n; /* the new hash table */
    unsigned long realsize = _dictNextPower(size);

    /* Rehashing to the same table size is not useful. */
    if (realsize == d->ht[0].size) return DICT_ERR;

    /* Allocate the new hash table and initialize all pointers to NULL */
    n.size = realsize;
    n.sizemask = realsize-1;
    n.table = zcalloc(realsize*sizeof(dictEntry*));
    n.used = 0;

    /* Is this the first initialization? If so it's not really a rehashing
     * we just set the first hash table so that it can accept keys. */
    if (d->ht[0].table == NULL) {
        d->ht[0] = n;
        return DICT_OK;
    }

    /* Prepare a second hash table for incremental rehashing */
    d->ht[1] = n;
    d->rehashidx = 0;
    return DICT_OK;
}

缩容：server.c

int htNeedsResize(dict *dict) {
    long long size, used;

    size = dictSlots(dict);
    used = dictSize(dict);
    return (size > DICT_HT_INITIAL_SIZE &&
            (used*100/size < HASHTABLE_MIN_FILL));
}