普林斯顿《算法》笔记（三）

时间 2019-12-08

标签普林斯顿算法笔记繁體版

原文原文链接

官方网站 官方代码

第三章查找

3.1 符号表 (Symbol Tables)

符号表是一种存储键值对 (key-value pairs) 的数据结构，其主要目的是将键 (key) 和值 (value) 联系起来。主要支持两种操做：插入 (put) ，即将一组新的键值对存入存入表中；查找 (get) ，即根据给定的键获得相应的值。java

下表列出了符号表的典型应用：算法

符号表是一种典型的抽象数据类型，它表明一组清晰定义的值以及对这些值相应的操做，使得咱们可以将类型的实现和使用区别开来。下图是一种泛型符号表API：数组

键的等价性 数据结构

在Java中全部对象都继承了一个 equals() 方法，Java也为其标准数据类型如Integer、Double和String以及其余一些更加复杂的类型，如File和URL，实现了 equals() 方法。若是是自定义的键则须要重写 equals() 方法 (如1.2.5.8中的Date类型)，这时最好使用不可变 (immutable) 的数据类型做为键，如 Integer, Double, String, java.io.File等。若是键是Comparable对象，则 a.compareTo(b) == 0 和 a.equals(b) 是等价的。函数

有序符号表 (Ordered symbol tables) 性能

典型的应用程序中，键都是Comparable对象，所以能够用 a.compareTo(b) 来比较a和b两个键。这样就能经过Comparable接口带来的键的有序性来更好地实现put() 和 get() 方法，由此也能定义更多实用操做，以下表所示。通常只要见到类的声明中含有泛型变量 Key extends Comparable<Key>，则说明这段程序实在实现这份API。网站

符号表用例一： this

public static void main(String[] args)
{
    ST<String, Integer> st;
    st = new ST<String, Integer>();
    for (int i = 0; !.StdIn.isEmpty(); i++)
    {
        String key = StdIn.readString();
        st.put(key, i);
    }
    for (String s : st.keys())
        StdOut.println(s + " " + st.get(s));
}

符号表用例二： spa

下列FrequencyCounter 用例统计了标准输入中各个单词的出现频率，而后将频率最高的且不小于指定长度的单词打印出来。设计

public class FrequencyCounter
{
    public static void main(String[] args)
    {
        int minlen = Integer.parseInt(args[0]);
        ST<String, Integer> st = new ST<String, Integer>();
        while (!StdIn.isEmpty())
        {
            String word = StdIn.readString();
            if (word.length() < minlen) continue;
            if (!st.contains(word)) st.put(word, 1);
            else                    st.put(word, st.get(word) + 1);
        }
        String max = "";
        st.put(max, 0);
        for (String word : st.keys())
            if (st.get(word) > st.get(max))
                max = word;
        StdOut.println(max + " " + st.get(max));
    }
}

/***************************************
% java FrequancyCounter 1 < tinyTale.txt
it 10
***************************************/

无序链表中的顺序查找 (Sequential Search)

顺序查找经过get()方法会顺序地搜索链表查找给定的键，并返回相应的值；put() 方法一样顺序地搜索给定的键，若是找到则更新相关联的值，不然建立一个新结点并将其插入到链表的开头。

import edu.princeton.cs.algs4.Queue;
import edu.princeton.cs.algs4.StdIn;
import edu.princeton.cs.algs4.StdOut;

public class SequentialSearchST<Key, Value>
{
    private int n;
    private Node first;

    private class Node
    {
        private Key key;
        private Value val;
        private Node next;

        public Node(Key key, Value val, Node next)
        {
            this.key = key;
            this.val = val;
            this.next = next;
        }
    }

    public int size()
    { return n; }

    public boolean isEmpty()
    { return size() == 0; }

    public boolean contains(Key key)
    { return get(key) != null; }

    public Value get(Key key)
    {
        for (Node x = first; x != null; x = x.next)
            if (key.equals(x.key))
                return x.val;
        return null;
    }

    public void put(Key key, Value val)
    {
        if (val == null)
        { delete(key); return; }

        for (Node x = first; x != null; x = x.next)
        {
            if (key.equals(x.key))
            { x.val = val; return; }
        }
        first = new Node(key, val, first);
        n++;
    }
    
    public void delete(Key key)
    { first = delete(first, key); }

    private Node delete(Node x, Key key)
    {
        if (key.equals(x.key))
        { n--; return x.next; }
        x.next = delete(x.next, key);
        return x;
    }

    public Iterable<Key> keys()
    {
        Queue<Key> queue = new Queue<Key>();
        for (Node x = first; x != null; x = x.next)
            queue.enqueue(x.key);
        return queue;
    }

    public static void main(String[] args)
    {
        SequentialSearchST<String, Integer> st = new SequentialSearchST<String, Integer>();
        for (int i = 0; !StdIn.isEmpty(); i++)
        {
            String key = StdIn.readString();
            st.put(key, i);
        }
        for (String s : st.keys())
            StdOut.println(s + " " + st.get(s));
    }
}

有序数组中的二分查找 (Binary Search)

下列实现使用两个数组分别保存key 和 value，这样的数据结构是一对平行的数组。须要建立一个Key类型的Comparable对象的数组和一个Value 类型的Object对象的数组，并在构造函数中将它们转化为Key[] 和 Value[]。

put() 方法能够保证数组中Comparable 类型的键有序，具体是使用rank() 方法获得键的具体位置，而后将全部更大的键向后移动一格来腾出位置并插入新的键值。

这份实现的核心在于rank()方法，它返回表中小于给定键的键的数量。如下是递归版本的代码：

public int rank(Key key, int lo, int hi)
{
    if (hi < lo) return lo;
    int mid = (lo + hi) / 2;
    int cmp = key.compareTo(keys[mid]);
    if (cmp < 0) return rank(key, lo, mid-1);
    else if (cmp > 0) return rank(key, mid+1, hi);
    else return mid;
}

import edu.princeton.cs.algs4.Queue;
import edu.princeton.cs.algs4.StdIn;
import edu.princeton.cs.algs4.StdOut;

public class BinarySearchST<Key extends Comparable<Key>, Value>
{
    private static final int INIT_CAPACITY = 2;
    private Key[] keys;
    private Value[] vals;
    private int n = 0;

    public BinarySearchST()
    { this(INIT_CAPACITY); }

    public BinarySearchST(int capacity)
    {
        keys = (Key[]) new Comparable[capacity];
        vals = (Value[]) new Object[capacity];
    }

    private void resize(int capacity)
    {
        assert capacity >= n;
        Key[]    tempk = (Key[])   new Comparable[capacity];
        Value[]  tempv = (Value[]) new Object[capacity];
        for (int i = 0; i < n; i++)
        {
            tempk[i] = keys[i];
            tempv[i] = vals[i];
        }
        keys = tempk;
        vals = tempv;
    }

    public int size()
    { return n; }

    public boolean isEmpty()
    { return size() == 0; }

    public boolean contains(Key key)
    { return get(key) != null; }

    public Value get(Key key)
    {
        if (isEmpty()) return null;
        int i = rank(key);
        if (i < n && keys[i].compareTo(key) == 0) return vals[i];
        return null;
    }

    public int rank(Key key)
    {
        int lo = 0, hi = n-1;
        while (lo <= hi)
        {
            int mid = lo + (hi - lo) / 2;
            int cmp = key.compareTo(keys[mid]);
            if      (cmp < 0) hi = mid - 1;
            else if (cmp > 0) lo = mid + 1;
            else return mid; 
        }
        return lo;
    }

    public void put(Key key, Value val)
    {
        if (val == null)
        { delete(key); return; }

        int i = rank(key);

        if (i < n && keys[i].compareTo(key) == 0)
        { vals[i] = val; return; }

        if (n == keys.length) resize(2*keys.length);

        for (int j = n; j > i; j--)
        {
            keys[j] = keys[j-1];
            vals[j] = vals[j-1];
        }

        keys[i] = key;
        vals[i] = val;
        n++;
    }

    public void delete(Key key)
    {
        int i = rank(key);

        if (i == n || keys[i].compareTo(key) != 0)
        { return; }

        for (int j = i; j < n-1; j++)
        {
            keys[j] = keys[j+1];
            vals[j] = vals[j+1];
        }

        n--;
        keys[n] = null;
        vals[n] = null;

        if (n > 0 && n == keys.length/4) resize(keys.length/2);  
    }

    public void deleteMin()
    { delete(min()); }

    public void deleteMax()
    { delete(max()); }

    public Key min()
    { return keys[0]; }

    public Key max()
    { return keys[n-1]; }

    public Key select(int k)
    { return keys[k]; }

    public Key floor(Key key)
    {
        int i = rank(key);
        if (i < n && key.compareTo(keys[i]) == 0) return keys[i];
        if (n == 0) return null;
        else return keys[i-1];
    }

    public Key ceiling(Key key)
    {
        int i = rank(key);
        if (i == n) return null;
        else return keys[i];
    }

    public int size(Key lo, Key hi)
    {
        if (lo.compareTo(hi) > 0) return 0;
        if (contains(hi)) return rank(hi) - rank(lo) + 1;
        else              return rank(hi) - rank(lo);
    }

    public Iterable<Key> keys()
    { return keys(min(), max()); }

    public Iterable<Key> keys(Key lo, Key hi)
    {
        Queue<Key> queue = new Queue<Key>();
        if (lo.compareTo(hi) > 0) return queue;
        for (int i = rank(lo); i < rank(hi); i++)
            queue.enqueue(keys[i]);
        if (contains(hi)) queue.enqueue(keys[rank(hi)]);
        return queue;
    }

    public static void main(String[] args)
    { 
        BinarySearchST<String, Integer> st = new BinarySearchST<String, Integer>();
        for (int i = 0; !StdIn.isEmpty(); i++)
        {
            String key = StdIn.readString();
            st.put(key, i);
        }

        System.out.println(st.size());
        st.delete("S");
        System.out.println(st.floor("Z"));
        System.out.println(st.select(2));
        System.out.println(st.size());

        for (String s : st.keys())
            StdOut.println(s + " " + st.get(s));
    }
}

Sequential search 和 binary search 的成本模型，插入都是线性级别的，对于大数组来讲太慢了。咱们须要一种更复杂的数据结构如二叉查找树 (binary search tree)来实现对数级别的插入和查找操做。

要支持高效的插入操做，须要一种链式结构，但单链表是没法使用binary search的，由于binary search的高效性来源于可以经过索引快速取得任何子数组的中间元素 (但获得一条链表的中间元素的惟一方法是沿链表遍历)。下一节的二叉查找树 (binary search tree)能够将binary search的效率和链表的灵活性结合起来。下表是本章各类符号表实现的优缺点概览：

3.2 二叉查找树 (Binary Search Tree)

二叉树 (binary tree)：每一个结点最多有两个子树的树结构。

二叉查找树 (binary search tree)：二叉查找树的每一个结点包含一个Comparable键和一个值，且每一个结点的键 (key)都大于其左子树中的任意结点的键而小于右子树的任意结点的键。是一种结合了链表插入的灵活性和有序数组查找的高效性的符号表实现。

3.2.1 查找

查找分为两种状况：若是含有该键的结点存在表中，则返回相应的值；若不存在则返回null。

3.2.2 插入

插入一样分两种状况：若是该键的结点在表中，则更新值；若不存在则增长新结点。

二叉查找树可以保持键的有序性，所以它能够做为实现有序符号表中众多方法的基础。

3.2.3 floor 和 ceiling

3.2.4 Selection

Selection操做是要找到排名为k的键 (即树中正好有k个小于它的键)。

3.2.5 Rank

rank() 是 select()的逆方法，返回给定键的排名。

3.2.6 删除最大键和删除最小键

3.2.7 删除操做

3.2.8 范围查找 (range queries)

要实现可以返回给定范围内键的keys方法，首先要对二叉树进行中序遍历 (inorder traversal)。以下所示：

private void print(Node x)
{
    if (x == null) return;
    print(x.left);
    StdOut.println(x.key);
    print(x.right);
}

范围查找主要经过将全部给定范围内的键加入队列 Queue 来实现。

下列二叉查找树的实现中树由 Node 对象组成，每一个对象含有一对键值，两条连接 (link) 和一个结点计数器N。每一个Node对象都是一颗含有N个结点的子树的根结点。

import edu.princeton.cs.algs4.StdIn;
import edu.princeton.cs.algs4.StdOut;
import edu.princeton.cs.algs4.Queue;

public class BST<Key extends Comparable<Key>, Value>
{
    private Node root;

    private class Node
    {
        private Key key;
        private Value val;
        private Node left, right;
        private int N;

        public Node(Key key, Value val, int N)
        { this.key = key; this.val = val; this.N = N; }
    }

    public int size()
    { return size(root); }

    private int size(Node x)
    {
        if (x == null) return 0;
        else           return x.N;
    }

    public Value get(Key key)
    { return get(root, key); }

    private Value get(Node x, Key key)
    {
        if (x == null) return null;
        int cmp = key.compareTo(x.key);
        if      (cmp < 0) return get(x.left, key);
        else if (cmp > 0) return get(x.right, key);
        else return x.val;
    }

    public void put(Key key, Value val)
    { root = put(root, key, val); }

    private Node put(Node x, Key key, Value val)
    {
        if (x == null) return new Node(key, val, 1);
        int cmp = key.compareTo(x.key);
        if      (cmp < 0) x.left = put(x.left, key, val);
        else if (cmp > 0) x.right = put(x.right, key, val);
        else x.val = val;
        x.N = size(x.left) + size(x.right) + 1;
        return x;
    }

    public Key min()
    { return min(root).key; }

    private Node min(Node x)
    {
        if (x.left == null) return x;
        return min(x.left);
    }

    public Key max()
    { return max(root).key; }

    private Node max(Node x)
    {
        if (x.right == null) return x;
        return max(x.right);
    }

    public Key floor(Key key)
    {
        Node x = floor(root, key);
        if (x == null) return null;
        return x.key;
    }
    
    private Node floor(Node x, Key key)
    {
        if (x == null) return null;
        int cmp = key.compareTo(x.key);
        if (cmp == 0) return x;
        if (cmp < 0) return floor(x.left, key);
        Node t = floor(x.right, key);
        if (t != null)  return t;
        else            return x;
    }

    public Key select(int k)
    { return select(root, k).key; }

    private Node select(Node x, int k)
    {
        if (x == null) return null;
        int t = size(x.left);
        if      (t > k) return select(x.left, k);
        else if (t < k) return select(x.right, k-t-1);
        else            return x;
    }

    public int rank(Key key)
    { return rank(key, root); }

    private int rank(Key key, Node x)
    {
        if (x == null) return 0;
        int cmp = key.compareTo(x.key);
        if      (cmp < 0) return rank(key, x.left);
        else if (cmp > 0) return 1 + size(x.left) + rank(key, x.right);
        else    return size(x.left);
    }

    public void deleteMin()
    { root = deleteMin(root); }

    private Node deleteMin(Node x)
    {
        if (x.left == null) return x.right;
        x.left = deleteMin(x.left);
        x.N = size(x.left) + size(x.right) + 1;
        return x;
    }

    public void delete(Key key)
    { root = delete(root, key); }

    private Node delete(Node x, Key key)
    {
        if (x == null) return null;
        int cmp = key.compareTo(x.key);
        if      (cmp < 0) x.left = delete(x.left, key);
        else if (cmp > 0) x.right = delete(x.right, key);
        else
        {
            if (x.right == null) return x.left;
            if (x.left == null) return x.right;
            Node t = x;
            x = min(t.right);
            x.right = deleteMin(t.right);
            x.left = t.left;
        }
        x.N = size(x.left) + size(x.right) + 1;
        return x;
    }

    public Iterable<Key> keys()
    { return keys(min(), max()); }

    public Iterable<Key> keys(Key lo, Key hi)
    {
        Queue<Key> queue = new Queue<Key>();
        keys(root, queue, lo, hi);
        return queue;
    }

    private void keys(Node x, Queue<Key> queue, Key lo, Key hi)
    {
        if (x == null) return;
        int cmplo = lo.compareTo(x.key);
        int cmphi = hi.compareTo(x.key);
        if (cmplo < 0) keys(x.left, queue, lo, hi);
        if (cmplo <= 0 && cmphi >= 0) queue.enqueue(x.key);
        if (cmphi >0 ) keys(x.right, queue, lo, hi);
    }

    public static void main(String[] args)
    {
        BST<String, Integer> st = new BST<String, Integer>();
        for (int i = 0; !StdIn.isEmpty(); i++)
        {
            String key = StdIn.readString();
            st.put(key, i);
        }
        for (String s : st.keys())
            StdOut.println(s + " " + st.get(s));
    }
}

各类符号表实现的比较：

3.3 平衡查找树 (Balanced Search Tree)

一、 2-3查找树

一颗2-3查找树或为一颗空树，或由如下结点组成：

2- 结点，含有一个键(key) (及其对应的值) 和两条连接 (link)，左连接指向的的键都小于该结点，右连接指向的键都大于该结点。

3- 结点，含有两个键 (及其对应的值) 和三条连接，左连接指向的键都小于该结点，中连接指向的键位于该结点两个键之间，右连接指向的键都大于该结点。

同时将指向一颗空树的连接成为空连接 (null link)。

查找和各类插入操做

这部分直接看图可比文字描述清晰多了。

下图总结了将一个4- 结点分解为一颗2-3 树可能的6种状况。2-3 树插入算法的根本在于这些变换都是局部的：除了相关的结点和连接外没必要修改或检查树的其余部分。

这些局部变换不会影响树的全局有序性和平衡性：任意空连接到根结点的路径长度都是相等的。如下图举例，变换前根结点到全部空连接的路径长度为h，变换后仍然为h。只有当根结点被分解为3个2- 结点时，路径长度才会加1。

下图显示了2-3树的平衡性，对于升序插入10个键，二叉查找树会变成高度为9，而2-3 树的高度则为2 。

二、红黑二叉查找树 (Red Black BST)

红黑树的红连接(red link) 将两个2- 结点链接起来构成一个3- 结点，黑连接则是2-3 树的普通连接。红黑树知足如下条件：

红连接均为左连接
没有任何一个结点同时和两条红连接相连
树是完美黑色平衡的，即任意空连接到根结点的路径上的黑连接的数量相同

颜色表示

这里约定空连接为黑色。

旋转

各类插入操做

全部插入步骤均可以归结为一下三种操做：

若是右子结点是红色而左子结点是黑色，进行左旋转。
若是左子结点是红色且它的子结点也是红色，进行右旋转。
若是左右子结点均为红色，则进行颜色变换。

import edu.princeton.cs.algs4.StdIn;
import edu.princeton.cs.algs4.StdOut;
import edu.princeton.cs.algs4.Queue;

public class RedBlackBST<Key extends Comparable<Key>, Value>
{
    private static final boolean RED = true;
    private static final boolean BLACK = false;

    private Node root;

    private class Node
    {
        private Key key;
        private Value val;
        private Node left, right;
        private boolean color;
        private int size;

        public Node(Key key, Value val, boolean color, int size)
        {
            this.key = key;
            this.val = val;
            this.color = color;
            this.size = size;
        }
    }

    private boolean isRed(Node x)
    {
        if (x == null) return false;
        return x.color == RED;
    }

    private int size(Node x)
    {
        if (x == null) return 0;
        return x.size;
    }

    public int size()
    { return size(root); }

    public boolean isEmpty()
    { return root == null; }

    public Value get(Key key)
    { return get(root, key); }

    private Value get(Node x, Key key)
    {
        while (x != null)
        {
            int cmp = key.compareTo(x.key);
            if      (cmp < 0) x = x.left;
            else if (cmp > 0) x = x.right;
            else              return x.val;
        }
        return null;
    }

    public void put(Key key, Value val)
    {
        root = put(root, key, val);
        root.color = BLACK;
    }

    private Node put(Node h, Key key, Value val)
    {
        if (h == null) return new Node(key, val, RED, 1);

        int cmp = key.compareTo(h.key);
        if      (cmp < 0) h.left = put(h.left, key, val);
        else if (cmp > 0) h.right = put(h.right, key, val);
        else              h.val = val;

        if (isRed(h.right) && !isRed(h.left))       h = rotateLeft(h);
        if (isRed(h.left)  && isRed(h.left.left))   h = rotateRight(h);
        if (isRed(h.left)  && isRed(h.right))       flipColors(h);
        h.size = size(h.left) + size(h.right) + 1;

        return h;
    }
    
    public void deleteMin()
    {
        if (!isRed(root.left) && !isRed(root.right))
            root.color = RED;

        root = deleteMin(root);
        if (!isEmpty()) root.color = BLACK;
    }

    private Node deleteMin(Node h)
    {
        if (h.left == null) return null;

        if (!isRed(h.left) && !isRed(h.left.left))
            h = moveRedLeft(h);

        h.left = deleteMin(h.left);
        return balance(h);
    }

    private Node rotateLeft(Node h)
    {
        Node x = h.right;
        h.right = x.left;
        x.left = h;
        x.color = x.left.color;
        x.left.color = RED;
        x.size = h.size;
        h.size = size(h.left) + size(h.right) + 1;
        return x;
    }

    private Node rotateRight(Node h) {
        // assert (h != null) && isRed(h.left);
        Node x = h.left;
        h.left = x.right;
        x.right = h;
        x.color = x.right.color;
        x.right.color = RED;
        x.size = h.size;
        h.size = size(h.left) + size(h.right) + 1;
        return x;
    }

    public Key min() {
        return min(root).key;
    } 

    // the smallest key in subtree rooted at x; null if no such key
    private Node min(Node x) { 
        // assert x != null;
        if (x.left == null) return x; 
        else                return min(x.left); 
    } 

 
    public Key max() {
        return max(root).key;
    } 

    private Node max(Node x) { 
        if (x.right == null) return x; 
        else                 return max(x.right); 
    } 


    private void flipColors(Node h)
    {
        h.color = !h.color;
        h.left.color = !h.left.color;
        h.right.color = !h.right.color;
    }

    private Node moveRedLeft(Node h)
    {
        flipColors(h);
        if (isRed(h.right.left))
        {
            h.right = rotateRight(h.right);
            h = rotateLeft(h);
            flipColors(h);
        }
        return h;
    }

    private Node balance(Node h)
    {
        if (isRed(h.right))             h = rotateLeft(h);
        if (isRed(h.left) && isRed(h.left.left)) h = rotateRight(h);
        if (isRed(h.left) && isRed(h.right)) flipColors(h);

        h.size = size(h.left) + size(h.right) + 1;
        return h;
    }

    public Iterable<Key> keys()
    {
        if (isEmpty()) return new Queue<Key>();
        return keys(min(), max());
    }

    public Iterable<Key> keys(Key lo, Key hi)
    {
        Queue<Key> queue = new Queue<Key>();
        keys(root, queue, lo, hi);
        return queue;
    }

    private void keys(Node x, Queue<Key> queue, Key lo, Key hi)
    {
        if (x == null) return;
        int cmplo = lo.compareTo(x.key);
        int cmphi = hi.compareTo(x.key);
        if (cmplo < 0) keys(x.left, queue, lo, hi);
        if (cmplo <= 0 && cmphi >= 0) queue.enqueue(x.key);
        if (cmphi > 0) keys(x.right, queue, lo, hi);
    }

    public static void main(String[] args)
    {
        RedBlackBST<String, Integer> st = new RedBlackBST<String, Integer>();
        for (int i = 0; !StdIn.isEmpty(); i++)
        {
            String key = StdIn.readString();
            st.put(key, i);
        }    
        for (String s : st.keys())
            StdOut.println(s + " " + st.get(s));
        StdOut.println(); 
    }
}

/*******************************
*  % more tinyST.txt
 *  S E A R C H E X A M P L E
 *  
 *  % java RedBlackBST < tinyST.txt
 *  A 8
 *  C 4
 *  E 12
 *  H 5
 *  L 11
 *  M 9
 *  P 10
 *  R 3
 *  S 0
 *  X 7
 *******************************/

红黑树的性质

全部基于红黑树的符号表实现都能保证操做的运行时间为对数级别。

一颗大小为N的红黑树的高度不会超过 \(2lgN\)。红黑树的最坏状况是它所对应的2-3 树最左边的路径结点均为3-结点而其他均为2-结点，这样最左边的路径长度是只包含2- 结点的路径长度(~ \(lgN\))的两倍。

各符号表实现的比较

3.4 散列表 (Hash Table)

散列表（Hash table，也叫哈希表），是根据经过键(key) 直接进行访问值 (value)的数据结构。也就是说，它经过把键映射到表中一个位置来访问记录，以加快查找的速度。这个映射函数叫作散列函数，存放记录的数组叫作散列表。

给定表M，存在函数f(key)，对任意给定的键 (key)，代入函数后若能获得包含该关键字的记录在表中的地址(即索引)，则称表M为哈希(Hash）表，函数f(key)为哈希(Hash) 函数。

散列函数

若是有一个可以保存M个键值对的数组，就须要一个能将任意键转化为该数组范围内的索引([0, M-1])的散列函数。散列函数应具备一下特色：

一致性 —— 等价的键必然产生相等的散列值
高效性 —— 计算简便
均匀性 —— 对于任意键映射到每一个索引都是等几率的

将整数散列的最经常使用方法是除留余数法(modular hashing)。选择大小为素数M的数组，对于任意正整数k，计算k除以M的余数。若是M不是素数，散列值可能没法均匀分布。

Java中全部数据类型都继承一个 hashCode() 方法，能返回一个32位整数。每一种数据类型的hashCode()方法必须与equals()是一致的。若是 a.equals(b)，那么a.hashCode()的返回值与b.hashCode()的返回值必须是一致的。若是两个对象的hashCode()方法的返回值不一样，那么这两个对象是不一样的。若是两个对象的hashCode()方法的返回值相同，这两个对象也可能不一样，还须要equals()方法判断。

将hashCode() 的返回值转化为一个数组索引: 由于须要的是数组索引而不是一个32位整数，所以在实现中会将默认的hashCode()方法和除留余数法结合起来产生一个0到M-1的整数：

private int hash(Key x)
{ return (x.hashCode() & 0x7fffffff) % M; }

这段代码会将符号位(sign bit)屏蔽(将一个32位整数变为一个31位非负整数)，而后计算它除以M的余数。

基于拉链法(separate chaining) 的散列表

一个散列函数能将键转化为数组索引，散列算法的第二步是碰撞处理(collision resolution)，也就是处理两个或多个键的散列值相同的状况。拉链法将大小为M的数组中的每一个元素指向一条链表，链表的每一个结点存储了一些键值对，同一条链表中键的散列值都同样，为该链表在散列表中的索引，以下图所示：

import edu.princeton.cs.algs4.StdIn;
import edu.princeton.cs.algs4.StdOut;
import edu.princeton.cs.algs4.SequentialSearchST;
import edu.princeton.cs.algs4.Queue;

public class SeparateChainingHashST<Key, Value>
{
    private static final int INIT_CAPACITY = 4;
    private int N;
    private int M;
    private SequentialSearchST<Key, Value>[] st;

    public SeparateChainingHashST()
    { this(INIT_CAPACITY); }

    public SeparateChainingHashST(int M)
    {
        this.M = M;
        st = (SequentialSearchST<Key, Value>[]) new SequentialSearchST[M];
        for (int i = 0; i < M; i++)
            st[i] = new SequentialSearchST<Key, Value>();
    }

    private void resize(int chains)
    {
        SeparateChainingHashST<Key, Value> temp = new SeparateChainingHashST<Key, Value>(chains);
        for (int i = 0; i < M; i++)
        {
            for (Key key : st[i].keys())
            { temp.put(key, st[i].get(key)); }
        }
        this.M = temp.M;
        this.N = temp.N;
        this.st = temp.st;
    }

    private int hash(Key key)
    { return (key.hashCode() & 0x7fffffff) % M; }

    public boolean contains(Key key)
    { return get(key) != null; }

    public Value get(Key key)
    {
        int i = hash(key);
        return st[i].get(key);
    }

    public void put(Key key, Value val)
    {
        if (val == null)
        {
            delete(key);
            return;
        }

        if (n >= 10*M) resize(2*M);

        int i = hash(key);
        if (!st[i].contains(key)) N++;
        st[i].put(key, val);
    }

    public void delete(Key key)
    {
        int i = hash(key);
        if (st[i].contains(key)) N--;
        st[i].delete(key);

        if (M > INIT_CAPACITY && N <= 2*M) resize(M/2);
    }

    public Iterable<Key> keys()
    {
        Queue<Key> queue = new Queue<Key>();
        for (int i = 0; i < M; i++)
        {
            for (Key key : st[i].keys())
                queue.enqueue(key);
        }
        return queue;
    }

    public static void main(String[] args)
    {
        SeparateChainingHashST<String, Integer> st = new SeparateChainingHashST<String, Integer>();
        for (int i = 0; !StdIn.isEmpty(); i++)
        {
            String key = StdIn.readString();
            st.put(key, i);
        }

        for (String s : st.keys())
            StdOut.println(s + " " + st.get(s));
    }
}

拉链法中链表的平均长度为\(N/M\) ，一张含有M条链表和N个键的散列表中，查找和插入操做所需的比较次数为 \(\sim N/M\) ，这样就比 sequential search 快了M倍。
散列表的大小： M若是太大，则有存在许多空链表浪费内存；若是过小则链表太长而浪费查找时间。典型的作法是选择 \(M \sim N / 5\)。
散列最主要的目的是将键均匀地散布开来，所以计算散列后键的顺序信息就消失了，所以不大适合实现有序方法。在键的顺序不是特别重要的应用中，拉链法多是最快的(也是应用最普遍) 的符号表实现。

基于线性探测法 (linear probing) 的散列表

实现散列表的另外一种方式是用大小为M的数组保存N个键值对，其中M > N。咱们须要依靠数组中的空位来解决碰撞冲突，基于这种策略的全部方法统称为开放地址(open addressing) 散列表。

线性探测法是开放地址散列表的一种。先使用散列函数键在数组中的索引，检查其中的键和被查找的键是否相同。若是不一样则继续查找 (将索引扩大，到达数组结尾时返回数组的开头)，知道找到该键或遇到一个空元素。

开放地址类散列表的核心思想是将内存做为散列表中的空元素，而不是将其当作链表，这些空元素能够做为查找结束的标志。下列实现中使用了并行数组，一条保存键，一条保存值。

删除操做：如何线性探测表中删除一个键？直接将该键所在的位置设为null是不行的，由于这会使得以后的元素没法被找到。可行的方法是将簇中被删除键右侧全部的键从新插入散列表。

import edu.princeton.cs.algs4.StdIn;
import edu.princeton.cs.algs4.StdOut;
import edu.princeton.cs.algs4.Queue;

public class LinearProbingHashST<Key, Value>
{
    private static final int INIT_CAPACITY = 4;
    private int N;
    private int M;
    private Key[] keys;
    private Value[] vals;

    public LinearProbingHashST()
    { this(INIT_CAPACITY); }

    public LinearProbingHashST(int capacity)
    {
        M = capacity;
        N = 0;
        keys = (Key[])   new Object[M];
        vals = (Value[]) new Object[M];
    }

    public int size()
    { return n; }

    public boolean isEmpty()
    { return size() == 0; }

    public boolean contains(Key key)
    { return get(key) != null; }

    private int hash(Key key)
    { return (key.hashCode() & 0x7fffffff) % M; }

    private void resize(int capacity)
    {
        LinearProbingHashST<Key, Value> temp = new LinearProbingHashST<Key, Value>(capacity);
        for (int i = 0; i < m; i++)
        {
            if (keys[i] != null)
            { temp.put(keys[i], vals[i]); }
        }
        keys = temp.keys;
        vals = temp.vals;
        M    = temp.M;
    }

    public void put(Key key, Value val)
    {
        if (N >= M/2) resize(2*M);
        int i;
        for (i = hash(key); keys[i] != null; i = (i + 1) % M)
        {
            if (keys[i].equals(key))
            {
                vals[i] = val;
                return;
            }
        }
        keys[i] = key;
        vals[i] = val;
        n++;
    }

    public Value get(Key key)
    {
        for (int i = hash(key); keys[i] != null; i = (i + 1) % M)
        {
            if (keys[i].equals(key)) return vals[i];    
        }
        return null;
    }

    public Iterable<Key> keys() {
        Queue<Key> queue = new Queue<Key>();
        for (int i = 0; i < m; i++)
            if (keys[i] != null) queue.enqueue(keys[i]);
        return queue;
    }

    public static void main(String[] args) { 
        LinearProbingHashST<String, Integer> st = new LinearProbingHashST<String, Integer>();
        for (int i = 0; !StdIn.isEmpty(); i++) {
            String key = StdIn.readString();
            st.put(key, i);
        }

        // print keys
        for (String s : st.keys()) 
            StdOut.println(s + " " + st.get(s)); 
    }
}

散列表和平衡查找树的比较：

Java中的java.util.TreeMap和java.util.HashMap分别是基于红黑树和拉链法的散列表的符号表实现。
相对于二叉查找树，散列表的优势在于代码更简单，且查找时间最优。二叉查找树的优势是抽象结构更简单(不须要设计散列函数)，红黑树能够保证在最坏状况下的性能且可以支持的操做更多(如排名、选择、排序和范围查找)。大多数时候的第一选择是散列表，在其余元素更重要时才会选择红黑树。

各类符号表实现的比较 2：

普林斯顿《算法》笔记（三）

官方网站 官方代码

3.1 符号表 (Symbol Tables)

官方网站官方代码