深刻Java源码解析容器类List、Set、Map

时间 2019-11-22

标签深刻 java 源码解析容器 list set map 栏目 Java 繁體版

原文原文链接

本篇文章带你从Java源码深刻解析关于Java容器的概念。html

参考文献：java

Java容器相关知识全面总结node
Java官方API文档算法

1 经常使用容器继承关系图

先上一张网上的继承关系图api

我的以为有些地方不是很准确，好比Iterator不是容器，只是一个操做遍历集合的方法接口，因此不该该放在里面。而且Map不该该继承自Collection。因此本身整理了一个经常使用继承关系图以下数组

如上图所示，接下去会自顶向下解释重要的接口和实现类。bash

2 Collection和Map

在Java容器中一共定义了2种集合, 顶层接口分别是Collection和Map。可是这2个接口都不能直接被实现使用，分别表明两种不一样类型的容器。数据结构

简单来看，Collection表明的是单个元素对象的序列，（能够有序/无序，可重复/不可重复等，具体依据具体的子接口Set，List，Queue等）；Map表明的是“键值对”对象的集合（一样能够有序/无序等依据具体实现）oracle

2.1 Collection

根据Java官方文档对Collection的解释app

The root interface in the collection hierarchy. A collection represents a group of objects, known as its elements. Some collections allow duplicate elements and others do not. Some are ordered and others unordered. The JDK does not provide any direct implementations of this interface: it provides implementations of more specific subinterfaces like Set and List. This interface is typically used to pass collections around and manipulate them where maximum generality is desired.

大概意思就是

是容器继承关系中的顶层接口。是一组对象元素组。有些容器容许重复元素有的不容许，有些有序有些无序。 JDK不直接提供对于这个接口的实现，可是提供继承与该接口的子接口好比 List Set。这个接口的设计目的是但愿能最大程度抽象出元素的操做。

接口定义：

public interface Collection<E> extends Iterable<E> {

    ...

}复制代码

泛型即该Collection中元素对象的类型，继承的Iterable是定义的一个遍历操做接口，采用hasNext next的方式进行遍历。具体实现仍是放在具体类中去实现。

咱们能够看下定义的几个重要的接口方法

add(E e) 
 Ensures that this collection contains the specified element

clear()
 Removes all of the elements from this collection (optional operation).

contains(Object o)
 Returns true if this collection contains the specified element.

isEmpty()
 Returns true if this collection contains no elements.

iterator()
 Returns an iterator over the elements in this collection.

remove(Object o)
 Removes a single instance of the specified element from this collection, if it is present (optional operation).

retainAll(Collection<?> c)
 Retains only the elements in this collection that are contained in the specified collection (optional operation).（**ps:这个平时却是没注意，感受挺好用的接口，保留指定的集合**）

size()
 Returns the number of elements in this collection.

toArray()
 Returns an array containing all of the elements in this collection.

toArray(T[] a)
 Returns an array containing all of the elements in this collection; the runtime type of the returned array is that of the specified array.（**ps:这个接口也能够mark下**）

 ...复制代码

上面定义的接口就表明了Collection这一类容器最基本的操做，包括了插入，移除，查询等，会发现都是对单个元素的操做，Collection这类集合即元素对象的存储。其中有2个接口平时没用过可是以为颇有用

retainAll(Collection<?> c) 保留指定的集合
toArray(T[] a) 能够转为数组

2.2 Map

Java官方文档对Map的解释

An object that maps keys to values. A map cannot contain duplicate keys; each key can map to at most one value.

This interface takes the place of the Dictionary class, which was a totally abstract class rather than an interface.

The Map interface provides three collection views, which allow a map's contents to be viewed as a set of keys, collection of values, or set of key-value mappings. The order of a map is defined as the order in which the iterators on the map's collection views return their elements. Some map implementations, like the TreeMap class, make specific guarantees as to their order; others, like the HashMap class, do not.

大概意思就是

一个保存键值映射的对象。映射Map中不能包含重复的key，每个key最多对应一个value。

这个接口替代了原来的一个抽象类Dictionary。

Map集合提供3种遍历访问方法，1.得到全部key的集合而后经过key访问value。2.得到value的集合。3.得到key-value键值对的集合（key-value键值对实际上是一个对象，里面分别有key和value）。 Map的访问顺序取决于Map的遍历访问方法的遍历顺序。有的Map，好比TreeMap能够保证访问顺序，可是有的好比HashMap，没法保证访问顺序。

接口定义以下：

public interface Map<K,V> {

    ...

    interface Entry<K,V> {
        K getKey();
        V getValue();
        ...
    } 
}复制代码

泛型分别表明key和value的类型。这时候注意到还定义了一个内部接口Entry，其实每个键值对都是一个Entry的实例关系对象，因此Map实际其实就是Entry的一个Collection，而后Entry里面包含key，value。再设定key不重复的规则，天然就演化成了Map。（我的理解）

下面介绍下定义的3个遍历Map的方法。

Set keySet()

会返回全部key的Set集合，由于key不能够重复，因此返回的是Set格式，而不是List格式。（以后会说明Set，List区别。这里先告诉一点Set集合内元素是不能够重复的，而List内是能够重复的）获取到全部key的Set集合后，因为Set是Collection类型的，因此能够经过Iterator去遍历全部的key，而后再经过get方法获取value。以下

Map<String,String> map = new HashMap<String,String>();
map.put("01", "zhangsan");
map.put("02", "lisi");
map.put("03", "wangwu");

Set<String> keySet = map.keySet();//先获取map集合的全部键的Set集合
Iterator<String> it = keySet.iterator();//有了Set集合，就能够获取其迭代器。

while(it.hasNext()) {
     String key = it.next();
      String value = map.get(key);//有了键能够经过map集合的get方法获取其对应的值。
     System.out.println("key: "+key+"-->value: "+value);//得到key和value值
}复制代码

Collection values()

直接获取values的集合，没法再获取到key。因此若是只须要value的场景能够用这个方法。获取到后使用Iterator去遍历全部的value。以下

Map<String,String> map = new HashMap<String,String>();
map.put("01", "zhangsan");
map.put("02", "lisi");
map.put("03", "wangwu");

Collection<String> collection = map.values();//返回值是个值的Collection集合
System.out.println(collection);复制代码

Set< Map.Entry< K, V>> entrySet()

是将整个Entry对象做为元素返回全部的数据。而后遍历Entry，分别再经过getKey和getValue获取key和value。以下

Map<String,String> map = new HashMap<String,String>();
map.put("01", "zhangsan");
map.put("02", "lisi");
map.put("03", "wangwu");

//经过entrySet()方法将map集合中的映射关系取出（这个关系就是Map.Entry类型）
Set<Map.Entry<String, String>> entrySet = map.entrySet();
//将关系集合entrySet进行迭代，存放到迭代器中 
Iterator<Map.Entry<String, String>> it = entrySet.iterator();

while(it.hasNext()) {
     Map.Entry<String, String> me = it.next();//获取Map.Entry关系对象me
      String key = me.getKey();//经过关系对象获取key
      String value = me.getValue();//经过关系对象获取value
}复制代码

经过以上3种遍历方式咱们能够知道，若是你只想获取key，建议使用keySet。若是只想获取value，建议使用values。若是key value但愿遍历，建议使用entrySet。（虽然经过keySet能够得到key再间接得到value，可是效率没entrySet高，不建议使用这种方法）

3 List、Set和Queue

在Collection这个集成链中，咱们介绍List、Set和Queue。其中会重点介绍List和Set以及几个经常使用实现class。Queue平时实在没用过。

先简单概述下List和Set。他们2个是继承Collection的子接口，就是说他们也都是负责存储单个元素的容器。可是最大的区别以下

List是存储的元素容器是有个有序的能够索引到元素的容器，而且里面的元素能够重复。
Set里面和List最大的区别是Set里面的元素对象不可重复。

3.1 List

Java文档中介绍

An ordered collection (also known as a sequence). The user of this interface has precise control over where in the list each element is inserted. The user can access elements by their integer index (position in the list), and search for elements in the list.

Unlike sets, lists typically allow duplicate elements. More formally, lists typically allow pairs of elements e1 and e2 such that e1.equals(e2), and they typically allow multiple null elements if they allow null elements at all. It is not inconceivable that someone might wish to implement a list that prohibits duplicates, by throwing runtime exceptions when the user attempts to insert them, but we expect this usage to be rare.

...

The List interface provides a special iterator, called a ListIterator, that allows element insertion and replacement, and bidirectional access in addition to the normal operations that the Iterator interface provides. A method is provided to obtain a list iterator that starts at a specified position in the list.

大概意思是

一个有序的Collection（或者叫作序列）。使用这个接口能够精确掌控元素的插入，还能够根据index获取相应位置的元素。

不像Set，list容许重复元素的插入。有人但愿本身实现一个list，禁止重复元素，而且在重复元素插入的时候抛出异常，可是咱们不建议这么作。

...

List提供了一种特殊的iterator遍历器，叫作ListIterator。这种遍历器容许遍历时插入，替换，删除，双向访问。而且还有一个重载方法容许从一个指定位置开始遍历。

而后咱们再看下List接口新增的接口，会发现add，get这些都多了index参数，说明在原来Collection的基础上，List是一个能够指定索引，有序的容器。在这注意如下添加的2个新Iteractor方法。

ListIterator<E> listIterator();

ListIterator<E> listIterator(int index);复制代码

咱们再看ListIterator的代码

public interface ListIterator<E> extends Iterator<E> {
    // Query Operations

    boolean hasNext();

    E next();

    boolean hasPrevious();

    E previous();

    int previousIndex();

    void remove();

    void set(E e);

    void add(E e);
}复制代码

一个集合在遍历过程当中进行插入删除操做很容易形成错误，特别是无序队列，是没法在遍历过程当中进行这些操做的。可是List是一个有序集合，因此在这实现了一个ListIteractor，能够在遍历过程当中进行元素操做，而且能够双向访问。

这个是以前开发中一直没有发现的，好东西。mark

以上就是List的基本概念和规则，下面咱们介绍2个经常使用List的实现类，ArrayList和LinkedList。

3.1.1 ArrayList

就Java文档的解释，整理出如下几点特色：

ArrayList是一个实现了List接口的可变数组
能够插入null
它的size, isEmpty, get, set, iterator,add这些方法的时间复杂度是O(1),若是add n个数据则时间复杂度是O(n).
ArrayList不是synchronized的。

而后咱们来简单看下ArrayList源码实现。这里只写部分源码分析。

全部元素都是保存在一个Object数组中，而后经过size控制长度。

transient Object[] elementData;

private int size;复制代码

这时候看下add的代码分析

public boolean add(E e) {
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    elementData[size++] = e;
    return true;
}

private void ensureCapacityInternal(int minCapacity) {
    if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
        minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
    }

    ensureExplicitCapacity(minCapacity);
}

private void ensureExplicitCapacity(int minCapacity) {
    modCount++;

    // overflow-conscious code
    if (minCapacity - elementData.length > 0)
        grow(minCapacity);
}

private void grow(int minCapacity) {
    // overflow-conscious code
    int oldCapacity = elementData.length;
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // minCapacity is usually close to size, so this is a win:
    elementData = Arrays.copyOf(elementData, newCapacity);
}复制代码

其实在每次add的时候会判断数据长度，若是不够的话会调用Arrays.copyOf，复制一份更长的数组，并把前面的数据放进去。

咱们再看下remove的代码是如何实现的。

public E remove(int index) {
    rangeCheck(index);

    modCount++;
    E oldValue = elementData(index);

    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index,
                         numMoved);
    elementData[--size] = null; // clear to let GC do its work

    return oldValue;
}复制代码

其实就是直接使用System.arraycopy把须要删除index后面的都往前移一位而后再把最后一个去掉。

PS：终于发现之前学习的数据结构用到用场了。O。O

3.1.2 LinkedList

LinkedList是一个链表维护的序列容器。和ArrayList都是序列容器，一个使用数组存储，一个使用链表存储。

数组和链表2种数据结构的对比：

查找方面。数组的效率更高，能够直接索引出查找，而链表必须从头查找。
插入删除方面。特别是在中间进行插入删除，这时候链表体现出了极大的便利性，只须要在插入或者删除的地方断掉链而后插入或者移除元素，而后再将先后链从新组装，可是数组必须从新复制一份将全部数据后移或者前移。
在内存申请方面，当数组达到初始的申请长度后，须要从新申请一个更大的数组而后把数据迁移过去才行。而链表只须要动态建立便可。

如上LinkedList和ArrayList的区别也就在此。根据使用场景选择更加适合的List。

下面简单展现LinkedList的部分源码解析。

首先是链表的节点的定义,很是简单的一个双向链表。

private static class Node<E> {
    E item;
    Node<E> next;
    Node<E> prev;

    Node(Node<E> prev, E element, Node<E> next) {
        this.item = element;
        this.next = next;
        this.prev = prev;
    }
}复制代码

而后每一个LinkedList中会持有链表的头指针和尾指针

transient int size = 0;

transient Node<E> first;

transient Node<E> last;复制代码

列举最基本的插入和删除的链表操做

private void linkFirst(E e) {
    final Node<E> f = first;
    final Node<E> newNode = new Node<>(null, e, f);
    first = newNode;
    if (f == null)
        last = newNode;
    else
        f.prev = newNode;
    size++;
    modCount++;
}

void linkLast(E e) {
    final Node<E> l = last;
    final Node<E> newNode = new Node<>(l, e, null);
    last = newNode;
    if (l == null)
        first = newNode;
    else
        l.next = newNode;
    size++;
    modCount++;
}

void linkBefore(E e, Node<E> succ) {
    // assert succ != null;
    final Node<E> pred = succ.prev;
    final Node<E> newNode = new Node<>(pred, e, succ);
    succ.prev = newNode;
    if (pred == null)
        first = newNode;
    else
        pred.next = newNode;
    size++;
    modCount++;
}

private E unlinkFirst(Node<E> f) {
    // assert f == first && f != null;
    final E element = f.item;
    final Node<E> next = f.next;
    f.item = null;
    f.next = null; // help GC
    first = next;
    if (next == null)
        last = null;
    else
        next.prev = null;
    size--;
    modCount++;
    return element;
}

private E unlinkLast(Node<E> l) {
    // assert l == last && l != null;
    final E element = l.item;
    final Node<E> prev = l.prev;
    l.item = null;
    l.prev = null; // help GC
    last = prev;
    if (prev == null)
        first = null;
    else
        prev.next = null;
    size--;
    modCount++;
    return element;
}

E unlink(Node<E> x) {
    // assert x != null;
    final E element = x.item;
    final Node<E> next = x.next;
    final Node<E> prev = x.prev;

    if (prev == null) {
        first = next;
    } else {
        prev.next = next;
        x.prev = null;
    }

    if (next == null) {
        last = prev;
    } else {
        next.prev = prev;
        x.next = null;
    }

    x.item = null;
    size--;
    modCount++;
    return element;
}复制代码

上面6个方法就是链表的核心，头尾中间插入，头尾中间删除。其余对外的调用都是围绕这几个方法进行操做的

同时LinkedList还实现了Deque接口，Deque接口是继承Queue的。因此LinkedList还支持队列的pop，push，peek操做。

总结

List实现	使用场景	数据结构
ArrayList	数组形式访问List链式集合数据，元素可重复，访问元素较快	数组
LinkedList	链表方式的List链式集合，元素可重复，元素的插入删除较快	双向链表

3.2 Set

Set的核心概念就是集合内全部元素不重复。在Set这个子接口中没有在Collection特别实现什么额外的方法，应该只是定义了一个Set概念。下面咱们来看Set的几个经常使用的实现HashSet、LinkedHashSet、TreeSet

3.2.1 HashSet

HashSet的核心概念。Java文档中描述

This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. This class permits the null element.

大概意思是

HashSet实现了Set接口，基于HashMap进行存储。遍历时不保证顺序，而且不保证下次遍历的顺序和以前同样。HashSet中容许null元素。

进入到HashSet源码中咱们发现，全部数据存储在

private transient HashMap<E,Object> map;

private static final Object PRESENT = new Object();复制代码

意思就是HashSet的集合其实就是HashMap的key的集合，而后HashMap的val默认都是PRESENT。HashMap的定义便是key不重复的集合。使用HashMap实现，这样HashSet就不须要再实现一遍。

因此全部的add，remove等操做其实都是HashMap的add、remove操做。遍历操做其实就是HashMap的keySet的遍历,举例以下

...
public Iterator<E> iterator() {
    return map.keySet().iterator();
}

public boolean contains(Object o) {
    return map.containsKey(o);
}

public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}

public void clear() {
    map.clear();
}
...复制代码

3.2.2 LinkedHashSet

LinkedHashSet的核心概念相对于HashSet来讲就是一个能够保持顺序的Set集合。HashSet是无序的，LinkedHashSet会根据add，remove这些操做的顺序在遍历时返回固定的集合顺序。这个顺序不是元素的大小顺序，而是能够保证2次遍历的顺序是同样的。

相似HashSet基于HashMap的源码实现，LinkedHashSet的数据结构是基于LinkedHashMap。过多的就不说了。

3.2.3 TreeSet

TreeSet便是一组有次序的集合，若是没有指定排序规则Comparator，则会按照天然排序。（天然排序即e1.compareTo(e2) == 0做为比较）

注意：TreeSet内的元素必须实现Comparable接口。

TreeSet源码的算法即基于TreeMap，具体算法在说明TreeMap的时候进行解释。

总结

Set实现	使用场景	数据结构
HashSet	无序的、无重复的数据集合	基于HashMap
LinkedSet	维护次序的HashSet	基于LinkedHashMap
TreeSet	保持元素大小次序的集合，元素须要实现Comparable接口	基于TreeMap

4 HashMap、LinkedHashMap、TreeMap和WeakHashMap

4.1 HashMap

HashMap就是最基础最经常使用的一种Map，它无序，以散列表的方式进行存储。以前提到过，HashSet就是基于HashMap，只使用了HashMap的key做为单个元素存储。

HashMap的访问方式就是继承于Map的最基础的3种方式，详细见上。在这里我具体分析一下HashMap的底层数据结构的实现。

在看HashMap源码前，先理解一下他的存储方式-散列表（哈希表）。像以前提到过的用数组存储，用链表存储。哈希表是使用数组和链表的组合的方式进行存储。(具体哈希表的概念自行搜索)以下图就是HashMap采用的存储方法。

hash获得数值，放到数组中，若是遇到冲突则以链表方式挂在下方。

HashMap的存储定义是

transient Node<K,V>[] table;

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
}复制代码

数组table存放元素，若是遇到冲突下挂到冲突元素的next链表上。

在这咱们能够看下get核心方法和put核心方法的源码

final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}复制代码

上面代码中看出先根据hash值和数组长度做且运算得出下标索引。若是存在判断hash值是否彻底一致，若是不彻底一致则next链表向下找一致的hash值。

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}复制代码

上面是put的核心源码，即查找hash值所在索引是否有元素，没有的话new一个Node直接放在table中。若是已经有Node了，就遍历该Node的next，将新元素放到最后。

HashMap的遍历，是从数组遍历第一个非空的元素，而后再根据这个元素访问其next下的全部Node。由于第一个元素不是必定从数组的0开始，因此HashMap是无序遍历。

4.2 LinkedHashMap

LinkedHashMap相对于HashMap来讲区别是，LinkedHashMap遍历的时候具备顺序，能够保存插入的顺序，（还能够设置最近访问的元素也放在前面，即LRU）

其实LinkedHashMap的存储仍是跟HashMap同样，采用哈希表方法存储，只不过LinkedHashMap多维护了一份head，tail链表。

transient LinkedHashMap.Entry<K,V> head;

transient LinkedHashMap.Entry<K,V> tail;复制代码

即在建立新Node的时候将新Node放到最后，这样遍历的时候再也不像HashMap同样，从数组开始判断第一个非空元素，而是直接从表头进行遍历。这样即知足有序遍历。

4.3 TreeMap

TreeMap平时用的很少，TreeMap会实现SortMap接口，定义一个排序规则，这样当遍历TreeMap的时候，会根据规定的排序规则返回元素。

4.4 WeakHashMap

WeakHashMap，此种Map的特色是，当除了自身有对key的引用外，此key没有其余引用那么此map会自动丢弃此值，

举例：声明了两个Map对象，一个是HashMap，一个是WeakHashMap，同时向两个map中放入a、b两个对象，当HashMap remove掉a 而且将a、b都指向null时，WeakHashMap中的a将自动被回收掉。出现这个情况的缘由是，对于a对象而言，当HashMap remove掉而且将a指向null后，除了WeakHashMap中还保存a外已经没有指向a的指针了，因此WeakHashMap会自动舍弃掉a，而对于b对象虽然指向了null，但HashMap中还有指向b的指针，因此
WeakHashMap将会保留。

WeakHashMap用的也很少，在这简单说起。

总结

Map实现	使用场景	数据结构
HashMap	哈希表存储键值对，key不重复，无序	哈希散列表
LinkedHashMap	是一个能够记录插入顺序和访问顺序的HashMap	存储方式是哈希散列表，可是维护了头尾指针用来记录顺序
TreeMap	具备元素排序功能	红黑树
WeakHashMap	弱键映射，映射以外无引用的键，能够被垃圾回收	哈希散列表

结尾

以上就是对于Java集合的完整分析和源码解析。其中ArrayList、HashMap使用较多，当考虑到效率时记得有Linded系列集合和WeakHashMap。Over~~

更多文章关注个人公众号