扯淡 Java 集合

时间 2020-07-09

标签扯淡 java 集合栏目 Java 繁體版

原文原文链接

大体分类：List、Set、Queue、Maphtml

Iterable

Collection 接口中继承 Iterable 接口。这个接口为 for each 循环设计、接口方法中有返回Iterator对象java

public interface Iterable<T> {
  Iterator<T> iterator();   default void forEach(Consumer<? super T> action) {  Objects.requireNonNull(action);  for (T t : this) {  action.accept(t);  }  }   default Spliterator<T> spliterator() {  return Spliterators.spliteratorUnknownSize(iterator(), 0);  } } 复制代码

咱们看个例子来理解一下上面的话web

LinkedList<Integer> linkedList = new LinkedList<>();
linkedList.add(1); linkedList.add(2); linkedList.add(3);  for (Integer integer : linkedList) {  System.out.println(integer); } 复制代码

反编译以后算法

LinkedList<Integer> linkedList = new LinkedList();
linkedList.add(1); linkedList.add(2); linkedList.add(3); Iterator var4 = linkedList.iterator();  while(var4.hasNext()) {  Integer integer = (Integer)var4.next();  System.out.println(integer); } 复制代码

Iterator

在 Iterable 接口中出现了这么一个迭代器数组

public interface Iterator<E> {
  boolean hasNext();   E next();   default void remove() {  throw new UnsupportedOperationException("remove");  }   default void forEachRemaining(Consumer<? super E> action) {  Objects.requireNonNull(action);  while (hasNext())  action.accept(next());  } } 复制代码

主要是为了统一遍历方式、使集合的数据结构和访问方式解耦安全

咱们来看看最多见的 ArrayList 类中的内部类数据结构

private class Itr implements Iterator<E> {
 int cursor; // 下一次要返回的下标  int lastRet = -1; // 这一次next 要返回的下标  int expectedModCount = modCount; // 修改次数   public boolean hasNext() {  return cursor != size;  }   @SuppressWarnings("unchecked")  public E next() {  checkForComodification();  int i = cursor;  if (i >= size)  throw new NoSuchElementException();  Object[] elementData = ArrayList.this.elementData;  if (i >= elementData.length)  throw new ConcurrentModificationException();  cursor = i + 1;  return (E) elementData[lastRet = i];  }   public void remove() {  if (lastRet < 0)  throw new IllegalStateException();  checkForComodification();   try {  ArrayList.this.remove(lastRet);  cursor = lastRet;  lastRet = -1;  expectedModCount = modCount;  } catch (IndexOutOfBoundsException ex) {  throw new ConcurrentModificationException();  }  }   @Override  @SuppressWarnings("unchecked")  public void forEachRemaining(Consumer<? super E> consumer) {  Objects.requireNonNull(consumer);  final int size = ArrayList.this.size;  int i = cursor;  if (i >= size) {  return;  }  final Object[] elementData = ArrayList.this.elementData;  if (i >= elementData.length) {  throw new ConcurrentModificationException();  }  while (i != size && modCount == expectedModCount) {  consumer.accept((E) elementData[i++]);  }  // update once at end of iteration to reduce heap write traffic  cursor = i;  lastRet = i - 1;  checkForComodification();  }   final void checkForComodification() {  if (modCount != expectedModCount)  throw new ConcurrentModificationException();  } } 复制代码

咱们都知道在 ArrayList 中 forEach 中的时候 remove 会致使 ConcurrentModificationException多线程

ArrayList<Integer> arrayList = new ArrayList<>();
arrayList.add(1); arrayList.add(1); arrayList.add(1);  for (Integer integer : arrayList) {  arrayList.remove(integer); } 复制代码

Exception in thread "main" java.util.ConcurrentModificationException
复制代码

而咱们使用 Iterator 进行 remove 的时候就不会有这个问题、并发

public void remove() {
 if (lastRet < 0)  throw new IllegalStateException();  checkForComodification();   try {  ArrayList.this.remove(lastRet);  cursor = lastRet;  lastRet = -1;  expectedModCount = modCount;  } catch (IndexOutOfBoundsException ex) {  throw new ConcurrentModificationException();  } } 复制代码

List

ArrayList

动态数组
线程不安全
元素容许为 null
实现了 List、RandomAccess、Cloneable、Serializable
连续的内存空间
增长和删除都会致使 modCount 的值改变
默认扩容为一半

Vector

线程安全
扩容是上一次的一倍
存在 modCount
每一个操做数组的方法都加上了 synchronized

CopyOnWriteArrayList

写时复制、加锁
耗内存
实时性不高
不存在 ConcurrentModificationException
数据量最好不要太大
使用 ReentrantLock 进行加锁

Collections.synchronizedList

synchronized 代码块
对象锁能够参数传进去、或者当前对象
须要传 List 对象进去

SynchronizedList(List<E> list) {
 super(list);  this.list = list; } SynchronizedList(List<E> list, Object mutex) {  super(list, mutex);  this.list = list; } 复制代码

LinkedList

ArrayList 增删效率低、改查效率高、而 LinkedList刚刚相反
链表实现
for 循环的时候、根据 index 是靠近前半段仍是后半段来决定是顺序仍是逆序
增删的时候会改变 modCount

Map

常见的四个实现类dom

HashMap
HashTable
LinkedHashMap
TreeMap

HashMap

HashMap 是数组+链表+红黑树（JDK1.8增长了红黑树部分）实现的，以下如所示。

transient Node<K,V>[] table;
// 实际存储的 key-value 的数量 transient int size; // 阈值、当存放在 table 中的 key-value 大于这个值的时候须要进行扩容 int threshold; // 负载因子 由于 threshold = loadFactor * table.length final float loadFactor; 复制代码

table 的长度默认是 16 、loadFactor 的默认值是 0.75

继续看看 Node 的数据结构

static class Node<K,V> implements Map.Entry<K,V> {
 final int hash;  final K key;  V value;  Node<K,V> next; } 复制代码

肯定哈希桶数组索引的位置

方法一：
static final int hash(Object key) { //jdk1.8 & jdk1.7  int h;  // h = key.hashCode() 为第一步 取hashCode值  // h ^ (h >>> 16) 为第二步 高位参与运算  return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); } 方法二： static int indexFor(int h, int length) { //jdk1.7的源码，jdk1.8 直接使用里面的方法体、没有定义这个方法  return h & (length-1); //第三步 取模运算 }  JDK 1.8 的 final V putVal(int hash, K key, V value, boolean onlyIfAbsent,  boolean evict) {  Node<K,V>[] tab; Node<K,V> p; int n, i;  if ((tab = table) == null || (n = tab.length) == 0)  n = (tab = resize()).length;  // 这里  if ((p = tab[i = (n - 1) & hash]) == null)  tab[i] = newNode(hash, key, value, null); ..................... .................... } 复制代码

这里的Hash算法本质上就是三步：取key的hashCode值、高位运算、取模运算

取模运算就是 h & (length - 1 ) 、其实它是等价于 h%length 、由于 length 老是 2 的 n 次方。由于 &比%具备更高的效率

(h = key.hashCode()) ^ (h >>> 16) 将 key 的 hashCode 与它的高 16 位进行异或的操做

其实为啥这么操做呢、是由于当 table 的数组的大小比较小的时候、key 的 hashCode 的高位信息就会直接被丢弃掉、这个时候就会增长了低位的冲突、因此将高位的信息经过异或保留下来

那其实为啥要异或呢？双目运算不是还有 & || 吗

来自知乎的解答

“
方法一其实叫作一个扰动函数、hashCode的高位和低位作异或、就是为了混合原始哈希码的高位和低位、以此加大低位的随机性、并且混合后的低位掺杂了高位的部分特征、这样高位的信息也被变相地保留下来、通过扰动以后、有效减小了哈希冲突

至于这里为何使用异或运行、由于在双目运算 & || ^ 中异或是混洗效果最好的、结果占双目运算两个数的50% 、混洗性是比较好的

https://www.zhihu.com/question/20733617/answer/111577937

https://codeday.me/bug/20170909/69679.html

关于 JDK 1.7 扩容致使循环链表问题

下面是 JDK 1.7 的扩容代码

void resize(int newCapacity) {   //传入新的容量
 2 Entry[] oldTable = table; //引用扩容前的Entry数组  3 int oldCapacity = oldTable.length;  4 if (oldCapacity == MAXIMUM_CAPACITY) { //扩容前的数组大小若是已经达到最大(2^30)了  5 threshold = Integer.MAX_VALUE; //修改阈值为int的最大值(2^31-1)，这样之后就不会扩容了  6 return;  7 }  8  9 Entry[] newTable = new Entry[newCapacity]; //初始化一个新的Entry数组 10 transfer(newTable); //！！将数据转移到新的Entry数组里 11 table = newTable; //HashMap的table属性引用新的Entry数组 12 threshold = (int)(newCapacity * loadFactor);//修改阈值 13 } 复制代码

void transfer(Entry[] newTable) {
 2 Entry[] src = table; //src引用了旧的Entry数组  3 int newCapacity = newTable.length;  4 for (int j = 0; j < src.length; j++) { //遍历旧的Entry数组  5 Entry<K,V> e = src[j]; //取得旧Entry数组的每一个元素  6 if (e != null) {  7 src[j] = null;//释放旧Entry数组的对象引用（for循环后，旧的Entry数组再也不引用任何对象）  8 do {  9 Entry<K,V> next = e.next; 10 int i = indexFor(e.hash, newCapacity); //！！从新计算每一个元素在数组中的位置 11 e.next = newTable[i]; //标记[1] 12 newTable[i] = e; //将元素放在数组上 13 e = next; //访问下一个Entry链上的元素 14 } while (e != null); 15 } 16 } 17 } 复制代码

咱们先看看美团博客上面的例子

单线程环境下是正常完成扩容的、可是有没有发现、倒置了、key7 在 key3 前面了。这个很关键

咱们再来看看多线程下、致使循环链表的问题

其实出现循环链表这种状况、就是由于扩容的时候、链表倒置了

而 JDK1.8 中、使用两个变量解决链表倒置而发生了循环链表的问题

Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null; 复制代码

经过 head 和 tail 两个变量、将扩容时链表倒置的问题解决了、循环链表的问题就解决了

可是不管如何、在并发的状况下、都会发生丢失数据的问题、就好比说上面的例子就丢失了 key5

HashTable

遗留类、不少功能和 HashMap 相似、可是它是线程安全的、可是任意时刻只能有一个线程写 HashTable、并发性不如 ConcurrentHashMap，由于 ConcurrentHashMap 使用分段锁。不建议使用

LinkedHashMap

LinkedHashMap继承自HashMap、在HashMap基础上、经过维护一条双向链表、解决了HashMap不能随时保持遍历顺序和插入顺序一致的问题

重写了 HashMap 的 newNode 方法

而且重写了 afterNodeInsertion 方法、这个方法原本在 HashMap 中是空方法

void afterNodeInsertion(boolean evict) { // possibly remove eldest
 LinkedHashMap.Entry<K,V> first;  if (evict && (first = head) != null && removeEldestEntry(first)) {  K key = first.key;  removeNode(hash(key), key, null, false, true);  } } 复制代码

而方法 removeEldestEntry 在 LinkedHashMap 中返回 false 、咱们能够经过重写此方法来实现一个 LRU 队列的

/**  * The iteration ordering method for this linked hash map: <tt>true</tt>  * for access-order, <tt>false</tt> for insertion-order.  *  * @serial  */ final boolean accessOrder; 复制代码

默认为 false 遍历的时候控制顺序

TreeMap

static final class Entry<K,V> implements Map.Entry<K,V> {
 K key;  V value;  Entry<K,V> left;  Entry<K,V> right;  Entry<K,V> parent;  boolean color = BLACK; 复制代码

TreeMap底层基于红黑树实现

Set

没啥好说的

Queue

PriorityQueue

默认小顶堆、能够看看关于堆排序的实现八种常见的排序算法

public boolean offer(E e) {
 if (e == null)  throw new NullPointerException();  modCount++;  int i = size;  if (i >= queue.length)  grow(i + 1);  size = i + 1;  if (i == 0)  queue[0] = e;  else  siftUp(i, e);  return true; } 复制代码

public boolean add(E e) {
 return offer(e); } 复制代码

“
强烈推荐文章参考的美团的这篇文章、关于 HashMap 的

https://tech.meituan.com/2016/06/24/java-hashmap.html