Map底层数据结构分析（HashMap TreeMap）

时间 2019-11-07

标签 map 底层数据结构分析 hashmap treemap 繁體版

原文原文链接

HashMap

结构特色 java

一、table是一个Entry[]数组类型，而Entry实际上就是一个单向链表。哈希表的"key-value键值对"都是存储在Entry数组中的。

static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
int hash;
}
复制代码

二、size是HashMap的大小，它是HashMap保存的键值对的数量。算法

/**
    * The number of key-value mappings contained in this map.
*/
transient int size;

void createEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}
复制代码

三、threshold是HashMap的阈值，用于判断是否须要调整HashMap的容量。当HashMap中存储数据的数量达到threshold时，就须要将HashMap的容量加倍。数组

/**
 * The next size value at which to resize (capacity * load factor).
 * @serial
 */
// If table == EMPTY_TABLE then this is the initial capacity at which the
// table will be created when inflated.
int threshold;

private void inflateTable(int toSize) {
    // Find a power of 2 >= toSize
    int capacity = roundUpToPowerOf2(toSize);
    // 通常状况下 threshold = capacity * loadFactor
    threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    table = new Entry[capacity];
    initHashSeedAsNeeded(capacity);
}
复制代码

四、上述代码中的loadFactor就是加载因子。缓存

/**
 * The load factor for the hash table.
 *
 * @serial
 */
final float loadFactor;

若是加载因子越大，对空间的利用更充分，可是查找效率会下降（链表长度会愈来愈长）。若是加载因子过小，那么表中的数据将过于稀疏（不少空间还没用，就开始扩容了），对空间形成严重浪费。若是咱们在构造方法中不指定，则系统默认加载因子为0.75，这是一个比较理想的值，通常状况下咱们是无需修改的。
复制代码

扩容安全

一、容量特色：不管咱们指定的容量为多少，构造方法都会将实际容量设为不小于指定容量的2的次方的一个数，且最大值不能超过2的30次方bash

public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);

    this.loadFactor = loadFactor;
    threshold = initialCapacity;
    init();
}

在构造方法中，只是进行了简单的初始化操做，容量的真实值并非这里肯定的。

public V put(K key, V value) {
// 若是是首次添加，进行容量的初始化
if (table == EMPTY_TABLE) {
    inflateTable(threshold);
}
if (key == null)
    return putForNullKey(value);
int hash = hash(key);
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
    Object k;
    if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
        V oldValue = e.value;
        e.value = value;
        e.recordAccess(this);
        return oldValue;
    }
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
复制代码

在put()方法中，对首次调用时候进行了判断，进行了数组的初始化操做,调用了初始化数组的方法inflateTable(threshold)。app

private void inflateTable(int toSize) {
// Find a power of 2 >= toSize
int capacity = roundUpToPowerOf2(toSize);
threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
table = new Entry[capacity];
initHashSeedAsNeeded(capacity);
}
复制代码

加载因子：加载因子越大，数组填充的越满，这样能够有效的利用空间，可是有一个弊端就是可能会致使冲突的加大，链表过长，反过来却又会形成内存空间的浪费。因此只能须要在空间和时间中找一个平衡点，那就是设置有效的加载因子。咱们知道，不少时候为了提升查询效率的作法都是牺牲空间换取时间，到底该怎么取舍，那就要具体分析。 hashCode() 和 equals()ide

一、 hashCode的存在主要是用于查找的快捷性，如Hashtable，HashMap等，hashCode是用来在散列存储结构中肯定对象的存储地址的；学习

int hash = hash(key);
int i = indexFor(hash, table.length);
static int indexFor(int h, int length) {
// assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
return h & (length-1);
}
复制代码

因为保证了数组的容量是一个偶数，h & (length-1)可以均匀的分布在散列表。ui

二、若是两个对象相同，就是适用于equals(java.lang.Object) 方法，那么这两个对象的hashCode必定要相同；

// 对key为空值的处理，HashMap中key值能够为空
if (key == null)
   return putForNullKey(value);
// hash值和equals()方法在put()方法中的运用
int hash = hash(key);
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
    V oldValue = e.value;
    e.value = value;
    e.recordAccess(this);
    return oldValue;
}
}
复制代码

三、若是对象的equals方法被重写，那么对象的hashCode也尽可能重写，而且产生hashCode使用的对象，必定要和equals方法中使用的一致，不然就会违反上面提到的第2点；

四、两个对象的hashCode相同，并不必定表示两个对象就相同，也就是不必定适用于equals(java.lang.Object) 方法，只可以说明这两个对象在散列存储结构中相同的索引位，如Hashtable，他们“存放在同一个篮子里”。 HsdhMap 概述

一、LinkedHashMap是HashMap的子类，与HashMap有着一样的存储结构，但它加入了一个双向链表的头结点，将全部put到LinkedHashmap的节点一一串成了一个双向循环链表，所以它保留了节点插入的顺序，可使节点的输出顺序与输入顺序相同。

/**
 * The head of the doubly linked list.
 */
// 记录添加顺序的双向链表
private transient Entry<K,V> header;

// 记录添加顺序的双向链表
private static class Entry<K,V> extends HashMap.Entry<K,V> {
// These fields comprise the doubly linked list used for iteration.
Entry<K,V> before, after;
}

// 添加节点方法
void addEntry(int hash, K key, V value, int bucketIndex) {
super.addEntry(hash, key, value, bucketIndex);

// Remove eldest entry if instructed
// 添加到双向节点节点的头部
Entry<K,V> eldest = header.after;
// 若是removeEldestEntry(eldest)为true，则删除最旧的节点，默认为false
if (removeEldestEntry(eldest)) {
    removeEntryForKey(eldest.key);
}
}
复制代码

二、LinkedHashMap能够用来实现LRU算法。

三、LinkedHashMap一样是非线程安全的，只在单线程环境下使用。 LRU算法

LRU（Least recently used，最近最少使用）算法根据数据的历史访问记录来进行淘汰数据，其核心思想是“若是数据最近被访问过，那么未来被访问的概率也更高”。

新数据插入到链表头部。每当缓存命中（即缓存数据被访问），则将数据移到链表头部。当链表满的时候，将链表尾部的数据丢弃。

上面3条中，LinkedHashMap实现了第一条，可是没有实现二、3两条规定

先看第2条，访问数据（查询，更新）时，在LinkedHashMap中，会作什么事情

首先看看更新操做，LinkedHashMap的更新操做，其实就是使用了HashMap的put()方法

public V put(K key, V value) {
if (table == EMPTY_TABLE) {
    inflateTable(threshold);
}
if (key == null)
    return putForNullKey(value);
int hash = hash(key);
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
    Object k;
    // 若是是已经存在的数据
    if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
        V oldValue = e.value;
        e.value = value;
        // 调用这个方法
        e.recordAccess(this);
        return oldValue;
    }
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
复制代码

咱们能够看到已经存在的数据进行更新时，有调用了recordAccess(this)这个方法，可是，这个方法在HashMap中是一个空实现，这个方法的真正实如今LinkedHashMap中

void recordAccess(HashMap<K,V> m) {
LinkedHashMap<K,V> lm = (LinkedHashMap<K,V>)m;
// accessOrder默认值为false
if (lm.accessOrder) {
    lm.modCount++;
    // remove()和addBefore()组合，移动到链表的头部
    remove();
    addBefore(lm.header);
}
}
复制代码

能够看见，若是accessOrder为true，就把节点移动到链表的头部。可是，默认值为false。把节点移动到链表的头部，是符合LRU的规则第二条的，惋惜默认为false。

而后，咱们来看看LinkedHashMap的查询方法

public V get(Object key) {
Entry<K,V> e = (Entry<K,V>)getEntry(key);
if (e == null)
    return null;
e.recordAccess(this);
return e.value;
}
复制代码

咱们发现，这个方法也调用了recordAccess(this)。可是，和上文描述状况同样，默认accessOrder参数为false，要把accessOrder设置为true，才知足Lru的规则第2条。

其实，在LinkedHashMap中，有设置accessOrder的构造方法。

public LinkedHashMap(int initialCapacity, float loadFactor, boolean accessOrder) {
super(initialCapacity, loadFactor);
//
this.accessOrder = accessOrder;
}
复制代码

咱们在编写Lru的LinkedHashMap的时候经过调用这个构造方法就能够设置accessOrder

最后，咱们来看看LinkedHashMap的添加操做

void addEntry(int hash, K key, V value, int bucketIndex) {
super.addEntry(hash, key, value, bucketIndex);

// Remove eldest entry if instructed
// 找出链表中的尾部节点
Entry<K,V> eldest = header.after;
// 若是removeEldestEntry(eldest)为true就删除节点，默认为false
if (removeEldestEntry(eldest)) {
    removeEntryForKey(eldest.key);
}
}
复制代码

咱们看出来，在removeEldestEntry(eldest)返回的是true的时候，可以实现Lru的第3条规定。

综合起来，咱们编写出了这个实现了Lru算法的LinkedHashMap

public class LRULinkedHashMap<K, V> extends LinkedHashMap<K, V> {
private static final long serialVersionUID = -5933045562735378538L;
// 定义Lru缓存的默认容量
private static final int LRU_MAX_CAPACITY = 1024;
private int capacity;

public LRULinkedHashMap() {
    super();
}

// 经过构造方法设置accessOrder
public LRULinkedHashMap(int initialCapacity, float loadFactor, boolean isLRU) {
    super(initialCapacity, loadFactor, true);
    capacity = LRU_MAX_CAPACITY;
}

// 经过构造方法设置accessOrder
public LRULinkedHashMap(int initialCapacity, float loadFactor,
        boolean isLRU, int lruCapacity) {
    super(initialCapacity, loadFactor, true);
    this.capacity = lruCapacity;
}

// 复写LinkedHashMap的removeEldestEntry()方法
@Override
protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
    System.out.println(eldest.getKey() + "=" + eldest.getValue());

    if (size() > capacity) {
        return true;
    }
    return false;
}
}
复制代码

TreeMap

定制比较器和天然比较器

TreeMap不保证元素的前后添加顺序,可是会对集合中的元素作排序操做，底层由有红黑树算法实现(树结构,比较擅长作范围查询)。

TreeSet要么天然排序,要么定制排序。

天然排序: 要求在TreeSet集合中的对象必须实现java.lang.Comparable接口，并覆盖compareTo方法。
定制排序: 要求在构建TreeSet对象的时候，传入一个比较器对象(必须实现java.lang.Comparator接口)。在比较器中覆盖compare方法,并编写比较规则.

TreeSet判断元素对象重复的规则:compareTo/compare方法是否返回0.若是返回0,则视为是同一个对象.

如下咱们来看看TreeMap中使用比较器的代码吧

// 若是传入了定制比较器
Comparator<? super K> cpr = comparator;
if (cpr != null) {
// 红黑树的算法
do {
    parent = t;
    cmp = cpr.compare(key, t.key);
    if (cmp < 0)
        t = t.left;
    else if (cmp > 0)
        t = t.right;
    else
        return t.setValue(value);
} while (t != null);
}
else {
if (key == null)
    throw new NullPointerException();
// 若是实现了天然比较器
Comparable<? super K> k = (Comparable<? super K>) key;
// 红黑树算法
do {
    parent = t;
    cmp = k.compareTo(t.key);
    if (cmp < 0)
        t = t.left;
    else if (cmp > 0)
        t = t.right;
    else
        return t.setValue(value);
} while (t != null);
}
复制代码

从上面的代码中咱们能够看出来，若是TreeMap中先是判判定制比较器，而后判断天然比较器。若是两个比较器都实现了，TreeMap会使用定制比较器进行比较，若是两个比较器都没有实现，TreeMap会在Comparable<? super K> k = (Comparable<? super K>) key一句中抛出强制转型失败异常。

红黑树概述

红黑树又称红-黑二叉树，它首先是一颗二叉树，它具体二叉树全部的特性。同时红黑树更是一颗自平衡的排序二叉树。

基本的二叉树他们都须要知足一个基本性质--即树中的任何节点的值大于它的左子节点，且小于它的右子节点。按照这个基本性质使得树的检索效率大大提升。

生成二叉树的过程是很是容易失衡的，最坏的状况就是一边倒（只有右/左子树），这样势必会致使二叉树的检索效率大大下降，因此为了维持二叉树的平衡，大牛们提出了各类实现的算法，如：AVL，SBT，伸展树，TREAP ，红黑树等等

平衡二叉树必须具有以下特性：它是一棵空树或它的左右两个子树的高度差的绝对值不超过1，而且左右两个子树都是一棵平衡二叉树。也就是说该二叉树的任何一个子节点，其左右子树的高度都相近。

红黑树规则

红黑树顾名思义就是节点是红色或者黑色的平衡二叉树，它经过颜色的约束来维持着二叉树的平衡。对于一棵有效的红黑树二叉树而言咱们必须增长以下规则：

每一个节点都只能是红色或者黑色
根节点是黑色
每一个叶节点（NIL节点，空节点）是黑色的。
若是一个结点是红的，则它两个子节点都是黑的。也就是说在一条路径上不能出现相邻的两个红色结点。
从任一节点到其每一个叶子的全部路径都包含相同数目的黑色节点。

红黑树基本操做

在添加或删除节点后，红黑树就发生了变化，可能再也不知足5个特性，为了保持红黑树的特性，就有了三个动做：左旋、右旋、着色。

红黑树添加操做

咱们先来看看红黑树的添加代码

public V put(K key, V value) {
    Entry<K,V> t = root;
    if (t == null) {
        compare(key, key); // type (and possibly null) check

        root = new Entry<>(key, value, null);
        size = 1;
        modCount++;
        return null;
    }
    int cmp;
    Entry<K,V> parent;
    // split comparator and comparable paths
    Comparator<? super K> cpr = comparator;
    if (cpr != null) {
        // 红黑树的比较操做，while循环比较，直到插入到树的叶子节点
        do {
            parent = t;
            cmp = cpr.compare(key, t.key);
            if (cmp < 0)
                t = t.left;
            else if (cmp > 0)
                t = t.right;
            else
                return t.setValue(value);
        } while (t != null);
    }
    else {
        if (key == null)
            throw new NullPointerException();
        Comparable<? super K> k = (Comparable<? super K>) key;
        // 红黑树的比较操做，while循环比较，直到插入到树的叶子节点
        do {
            parent = t;
            cmp = k.compareTo(t.key);
            if (cmp < 0)
                t = t.left;
            else if (cmp > 0)
                t = t.right;
            else
                return t.setValue(value);
        } while (t != null);
    }
    Entry<K,V> e = new Entry<>(key, value, parent);
    if (cmp < 0)
        parent.left = e;
    else
        parent.right = e;
    // 插入成为树的叶子节点以后，有可能使树不平衡（违反红黑树的5点规则），这个时候，须要调整。调用fixAfterInsertion(e)方法。
    fixAfterInsertion(e);
    size++;
    modCount++;
    return null;
    }
复制代码

咱们首先作的是先将节点插入成为树的叶子节点，而后再对树的平衡进行调整。具体调整的规则，主要输须要树符合红黑树的5点规则。

咱们在来看看红黑树的调整操做

private void fixAfterInsertion(Entry<K,V> x) {
    x.color = RED;
    while (x != null && x != root && x.parent.color == RED) {
        if (parentOf(x) == leftOf(parentOf(parentOf(x)))) {
            Entry<K,V> y = rightOf(parentOf(parentOf(x)));
            // 状况 1
            if (colorOf(y) == RED) {
                setColor(parentOf(x), BLACK);
                setColor(y, BLACK);
                setColor(parentOf(parentOf(x)), RED);
                x = parentOf(parentOf(x));
            } else {
                // 状况2 2小点
                if (x == rightOf(parentOf(x))) {
                    x = parentOf(x);
                    rotateLeft(x);
                }
                // 状况2 1小点
                setColor(parentOf(x), BLACK);
                setColor(parentOf(parentOf(x)), RED);
                rotateRight(parentOf(parentOf(x)));
            }
        } else {
            Entry<K,V> y = leftOf(parentOf(parentOf(x)));
            // 与上面的if语句，成为镜像
            if (colorOf(y) == RED) {
                setColor(parentOf(x), BLACK);
                setColor(y, BLACK);
                setColor(parentOf(parentOf(x)), RED);
                x = parentOf(parentOf(x));
            } else {
                if (x == leftOf(parentOf(x))) {
                    x = parentOf(x);
                    rotateRight(x);
                }
                setColor(parentOf(x), BLACK);
                setColor(parentOf(parentOf(x)), RED);
                rotateLeft(parentOf(parentOf(x)));
            }
        }
    }
    root.color = BLACK;
    }
复制代码

根据以上的代码，画出状况1 和状况2的图解，以下。咱们能够对照着上面的代码进行比较学习。

（1）新插入节点为根节点。这种状况直接将新插入节点设置为根节点便可，无需进行后续的旋转和着色处理。

（2）新插入节点的父节点是黑色。这种状况直接将新节点插入便可，不会违背规则（4）。

（3）新插入节点的父节点是红色。这种状况会违背规则（4），而这种状况又分为了如下几种，下面进行图解：

①新插入节点N的父节点P和叔叔节点U都是红色。方法是：将祖父节点G设置为红色，父节点P和叔叔节点U设置为黑色，这时候就看似平衡了。可是，若是祖父节点G的父节点也是红色，这时候又违背规则（4），调整方法是：将GPUN这一组当作一个新的节点，按照前面的方案递归；又可是根节点为红就违反规则（2），这时调整方法是直接将根节点设置为黑色（两个连续黑色是没问题的）。

![Uploading 3370616-3861bf4f5663f4d9_732975.png . . .]

②新插入节点N的父节点P是红色，叔叔节点U是黑色或者缺乏，且新节点N是P的右孩子。方法是：左旋父节点P。左旋后N和P角色互换，可是P和N仍是连续的两个红色节点，尚未平衡，怎么办，看第三种状况。

③新插入节点N的父节点P是红色，叔叔节点U是黑色或者缺乏，且新节点N是P的左孩子。方法是：右旋祖父节点G，而后将P设置为黑色，G设置为红色，达到平衡。此时父节点P是黑色，全部不用担忧P的父节点是红色。

红黑树删除操做

实际上，删除过程很太复杂，一般咱们都用不一样的方法来回避它。一种方法（和在普通的二叉树中同样）就是为删除的节点作个标记而不实际的删除他。任何找到该节点的查找例程都知道不用报告已找到该节点。不少状况下都应用这种方法，特别是在不常常执行删除操做时。红黑树的效率

红黑树的查找，插入和删除的时间复杂度为O(log2N)。在红黑树种的查找时间和在普通二叉树中的查找时间应该几乎彻底同样。由于在查找的过程当中并无应用红黑树的特征。

了解ConcurrentHashMap底层实现原理(JDK1.7 & 1.8)