二叉查找树实现原理分析

时间 2019-12-04

原文原文链接

引言

二叉查找树是一种能将链表插入的灵活性和有序数组查找的高效性结合起来的一种重要的数据结构，它是咱们后面学习红黑树和AVL树的基础，本文咱们就先来看一下二叉查找树的实现原理。java

二叉查找树的定义

二叉查找树最重要的一个特征就是：每一个结点都含有一个Comparable的键及其相关联的值，该结点的键要大于左子树中全部结点的键，而小于右子树中全部结点的键。node

下图就是一个典型的二叉查找树，咱们以结点E为例，能够观察到，左子树中的全部结点A和E都要小于E，而右子树中全部的结点R和H都要大于结点E。git

在实现二叉查找树中相关操做以前咱们先要来定义一个二叉查找树，因为Java中不支持指针操做，咱们能够用内部类Node来替代以表示树中的结点，每一个Node对象都含有一对键值(key和val)，两条连接(left和right)，和子节点计数器(size)。另外咱们还提早实现了size(), isEmpty()和contains()这几个基础方法，三种分别用来计算二叉树中的结点数目，判断二叉树是否为空，判断二叉树中是否含有包含指定键的结点。github

public class BST<Key extends Comparable<Key>, Value> {
    private Node root;             // root of BST

    private class Node {
        private Key key;           // sorted by key
        private Value val;         // associated data
        private Node left, right;  // left and right subtrees
        private int size;          // number of nodes in subtree

        public Node(Key key, Value val, int size) {
            this.key = key;
            this.val = val;
            this.size = size;
        }
    }

    // Returns the number of key-value pairs in this symbol table.
    public int size() {
        return size(root);
    }

    // Returns number of key-value pairs in BST rooted at x.
    private int size(Node x) {
        if(x == null) {
            return 0;
        } else {
            return x.size;
        }
    }

    // Returns true if this symbol table is empty.
    public boolean isEmpty() {
        return size() == 0;
    }

    // Returns true if this symbol table contains key and false otherwise.
    public boolean contains(Key key) {
        if(key == null) {
            throw new IllegalArgumentException("argument to contains() is null");
        } else {
            return get(key) != null;
        }
    }
}

查找和插入操做的实现

查找操做

咱们先来看一下如何在二叉树中根据指定的键查找到它相关联的结点。查找会有两种结果：查找成功或者不成功，咱们以查找成功的情形来分析一下整个查找的过程。前面咱们提到了二叉查找树的一个重要特征就是：左子树的结点都要小于根结点，右子树的结点都要大于根结点。根据这一性质，咱们从根结点开始遍历二叉树，遍历的过程当中会出现3种状况：数组

若是查找的键key小于根结点的key，说明咱们要查找的键若是存在的话确定在左子树，由于左子树中的结点都要小于根结点，接下来咱们继续递归遍历左子树。
若是要查找的键key大于根结点的key，说明咱们要查找的键若是存在的话确定在右子树中，由于右子树中的结点都要大于根节点，接下来咱们继续递归遍历右子树。
若是要查找的键key等于根结点的key，那么咱们就直接返回根结点的val。

上面的操做咱们利用递归能够很是容易的实现，代码以下：数据结构

/**
 * Returns the value associated with the given key.
 *
 * @param  key the key
 * @return the value associated with the given key if the key is in the symbol table
 *         and null if the key is not in the symbol table
 * @throws IllegalArgumentException if key is null
 */
public Value get(Key key) {
    if(key == null) {
        throw new IllegalArgumentException("first argument to put() is null");
    } else {
        return get(root, key);
    }
}

private Value get(Node x, Key key) {
    if(x == null) {
        return null;
    } else {
        int cmp = key.compareTo(x.key);
        if(cmp < 0) {
            return get(x.left, key);
        } else if(cmp > 0) {
            return get(x.right, key);
        } else {
            return x.val;
        }
    }
}

插入操做

若是理解了上面的查找操做，插入操做其实也很好理解，咱们首先要找到咱们新插入结点的位置，其思想和查找操做同样。找到插入的位置后咱们就将新结点插入二叉树。只是这里还要加一个步骤：更新结点的size，由于咱们刚刚新插入告终点，该结点的父节点，父节点的父节点的size都要加一。less

插入操做的实现一样有多种实现方法，可是递归的实现应该是最为清晰的。下面的代码的思想和get基本相似，只是多了x.N = size(x.left) + size(x.right) + 1;这一步骤用来更新结点的size大小。学习

/**
 * Inserts the specified key-value pair into the symbol table, overwriting the old
 * value with the new value if the symbol table already contains the specified key.
 * Deletes the specified key (and its associated value) from this symbol table
 * if the specified value is null.
 *
 * @param  key the key
 * @param  val the value
 * @throws IllegalArgumentException if key is null
 */
public void put(Key key, Value val) {
    if(key == null) {
        throw new IllegalArgumentException("first argument to put() is null");
    }
    if(val == null) {
        delete(key);
        return;
    }
    root = put(root, key, val);
    // assert check(); // Check integrity of BST data structure.
}

private Node put(Node x, Key key, Value val) {
    if(x == null) {
        return new Node(key, val, 1);
    } else {
        int cmp = key.compareTo(x.key);
        if(cmp < 0) {
            x.left = put(x.left, key, val)
        } else if(cmp > 0) {
            x.right = put(x.right, key, val);
        } else {
            x.val = val;
        }
        // reset links and increment counts on the way up
        x.size = size(x.left) + size(x.right) + 1;
        return x;
    }
}

select与rank的实现

select的实现

上面咱们的get()操做是经过指定的key去在二叉查找树中查询其关联的结点，二叉查找树的另一个优势就是它能够必定程度上保证数据的有序性，因此咱们能够较高效的去查询第n小的数据。this

首先咱们来思考一个问题：怎么知道一个二叉查找树中小于指定结点的子结点的个数？这一点根据二叉查找树的性质-左子树中的结点都要小于根结点很容易实现，咱们只须要统计左子树的大小就好了。结合下面这幅图，以查找二叉树第4小的结点咱们来看一下select操做的具体流程。spa

依次遍历二叉树，咱们来到了图2中的E结点，E结点的左子树有2个结点，它是二叉树中第3小的结点，因此咱们能够判断出要查找的结点确定在E结点的右子树中。因为咱们要查找第4小的结点，而E又是二叉树中第3小的结点，因此咱们要查找的这个结点接下来确定要知足一个特征：E的右子树中只有0个比它更小的结点，即右子树中最小的结点H。

select的实现以下，实际就是根据左子树的结点数目来判断当前结点在二叉树中的大小。

/**
 * Return the kth smallest key in the symbol table.
 *
 * @param  k the order statistic
 * @return the kth smallest key in the symbol table
 * @throws IllegalArgumentException unless k is between 0 and n-1
 */
public Key select(int k) {
    if (k < 0 || k >= size()) {
        throw new IllegalArgumentException("called select() with invalid argument: " + k);
    } else {
        Node x = select(root, k);
        return x.key;
    }
}

// Return the key of rank k.
public Node select(Node x, int k) {
    if(x == null) {
        return null;
    } else {
        int t = size(x.left);
        if(t > k) {
            return select(x.left, k);
        } else if(t < k) {
            return select(x.right, k);
        } else {
            return x;
        }
    }
}

rank就是查找指定的键key在二叉树中的排名，实现代码以下，思想和上面一致我就不重复解释了。

/**
 * Return the number of keys in the symbol table strictly less than key.
 *
 * @param  key the key
 * @return the number of keys in the symbol table strictly less than key
 * @throws IllegalArgumentException if key is null
 */
public int rank(Key key) {
    if (key == null) {
        throw new IllegalArgumentException("argument to rank() is null");
    } else {
        return rank(key, root);
    }
}

public int rank(Key key, Node x) {
    if(x == null) {
        return 0;
    } else {
        int cmp = key.compareTo(x.key);
        if(cmp < 0) {
            return rank(key, x.left);
        } else if(cmp > 0) {
            return 1 + size(x.left) + rank(key, x.right);
        } else {
            return size(x.left);
        }
    }
}

删除操做

删除操做是二叉查找树中最难实现的方法，在实现它以前，咱们先来看一下如何删除二叉查找树中最小的结点。

为了实现deleteMin()，咱们首先要找到这个最小的节点，很明显这个结点就是树中最左边的结点A，咱们重点关注的是怎么删除这个结点A。在咱们下面这幅图中结点E的左子树中的两个结点A和C都是小于结点E的，咱们只须要将结点E的左连接由A变为C便可，而后A就会自动被GC回收。最后一步就是更新节点的size了。

具体的实现代码以下：

/**
 * Removes the smallest key and associated value from the symbol table.
 *
 * @throws NoSuchElementException if the symbol table is empty
 */
public void deleteMin() {
    if (isEmpty()) {
        throw new NoSuchElementException("Symbol table underflow");
    } else {
        root = deleteMin(root);
        // assert check(); // Check integrity of BST data structure.
    }
}

private Node deleteMin(Node x) {
    if(x.left == null) {
        return x.right;
    } else {
        x.left = deleteMin(x.left);
        x.size = size(x.left) + size(x.right) + 1;
        return x;
    }
}

删除最大的结点也是一个道理，我就不重复解释了：

/**
 * Removes the largest key and associated value from the symbol table.
 *
 * @throws NoSuchElementException if the symbol table is empty
 */
public void deleteMax() {
    if (isEmpty()) {
        throw new NoSuchElementException("Symbol table underflow");
    } else {
        root = deleteMax(root);
        // assert check(); // Check integrity of BST data structure.
    }
}

private Node deleteMax(Node x) {
    if (x.right == null) {
        return x.left;
    } else {
        x.right = deleteMax(x.right);
        x.size = size(x.left) + size(x.right) + 1;
        return x;
    }
}

接下来咱们结合下图来一步步完整地看一下整个删除操做的过程，首先仍是和上面同样咱们要找到须要删除的结点E，而后咱们要在E的右子树中找到最小结点，这里是H，接下来咱们就用H替代E就好了。为何能够直接用H替代E呢？由于H结点大于E的左子树的全部结点，小于E的右子树中的其它全部结点，因此这一次替换并不会破坏二叉树的特性。

实现代码以下，这里解释一下执行到了// find key后的代码，这个时候会出现三种状况：

结点的右连接为空，这个时候咱们直接返回左连接来替代删除结点。
结点的左连接为空，这个时候返回右连接来替代删除结点。
左右连接都不为空的话，就是咱们上图中的那种情形了。

/**
 * Removes the specified key and its associated value from this symbol table
 * (if the key is in this symbol table).
 *
 * @param  key the key
 * @throws IllegalArgumentException if key is null
 */
public void delete(Key key) {
    if (key == null) {
        throw new IllegalArgumentException("argument to delete() is null");
    } else {
        root = delete(root, key);
        // assert check(); // Check integrity of BST data structure.
    }
}

private Node delete(Key key) {
    if(x == null) {
        return null;
    } else {
        int cmp = key.compareTo(x.key);
        if(cmp < 0) {
            x.left = delete(x.left, key);
        } else if(cmp > 0) {
            x.right = delete(x.right, key);
        } else {
            // find key
            if(x.right == null) {
                return x.left;
            } else if(x.left == null) {
                return x.right;
            } else {
                Node t = x;
                x = min(t.right);
                x.right = deleteMin(t.right);
                x.left = t.left;
            }
        }
        // update links and node count after recursive calls
        x.size = size(x.left) + size(x.right) + 1;
        return x;
    }
}

floor和ceiling的实现

floor的实现

floor()要实现的就是向下取整，咱们来分析一下它的执行流程：

若是指定的键key小于根节点的键，那么小于等于key的最大结点确定就在左子树中了。
若是指定的键key大于根结点的键，状况就要复杂一些，这个时候要分两种状况：1>当右子树中存在小于等于key的结点时，小于等于key的最大结点则在右子树中；2>反之根节点自身就是小于等于key的最大结点了。

具体实现代码以下：

/**
 * Returns the largest key in the symbol table less than or equal to key.
 *
 * @param  key the key
 * @return the largest key in the symbol table less than or equal to key
 * @throws NoSuchElementException if there is no such key
 * @throws IllegalArgumentException if  key is null
 */
public Key floor(Key key) {
    if (key == null) {
        throw new IllegalArgumentException("argument to floor() is null");
    }
    if (isEmpty()) {
        throw new NoSuchElementException("called floor() with empty symbol table");
    }
    Node x = floor(root, key);
    if (x == null) {
        return null;
    } else {
        return x.key;
    }
}


private Node floor(Node x, Key key) {
    if (x == null) {
        return null;
    } else {
        int cmp = key.compareTo(x.key);
        if(cmp == 0) {
            return x;
        } else if(cmp < 0) {
            return floor(x.left, key);
        } else {
            Node t = floor(x.right, key);
            if(t != null) {
                return t;
            } else {
                return x;
            }
        }
    }
}

rank的实现

rank()则与floor()相反，它作的是向下取整，即找到大于等于key的最小结点。可是二者的实现思路是一致的，只要将上面的左变为右，小于变为大于就好了：

/**
 * Returns the smallest key in the symbol table greater than or equal to {@code key}.
 *
 * @param  key the key
 * @return the smallest key in the symbol table greater than or equal to {@code key}
 * @throws NoSuchElementException if there is no such key
 * @throws IllegalArgumentException if {@code key} is {@code null}
 */
public Key ceiling(Key key) {
    if(key == null) {
        throw new IllegalArgumentException("argument to ceiling() is null");
    }
    if(isEmpty()) {
        throw new NoSuchElementException("called ceiling() with empty symbol table");
    }
    Node x = ceiling(root, key);
    if(x == null) {
        return null;
    } else {
        return x.key;
    }
}

private Node ceiling(Node x, Key key) {
    if(x == null) {
        return null;
    } else {
        int cmp = key.compareTo(x.key);
        if(cmp == 0) {
            return x;
        } else if(cmp < 0) {
            Node t = ceiling(x.left, key);
            if (t != null) {
                return t;
            } else {
                return x;
            }
        } else {
            return ceiling(x.right, key);
        }
    }
}

References

ALGORITHM 4TH

Contact

GitHub: https://github.com/ziwenxie
Blog: https://www.ziwenxie.site

本文为做者原创，转载请于开头明显处声明博客出处:)