《Oracle Database Concepts 11g Release 2》读书笔记(3)...

时间 2019-12-08

标签 oracle database concepts 11g release 读书笔记栏目 Oracle 繁體版

原文原文链接

Table Cluster (P50-P55)算法

1. Table Cluster定义sql

Table Cluster 是指一组 table 在一个相同的 block 里共享相同的列并存储相关的内容。当一个table被 cluster 后，一个block将包含不一样table中的row。例如，一个block包含的row 同时在employees 和departments中，而不只仅在某一单独表中。数据库

Cluster Key 是指被Cluster的表的列，例如employees和 departments所共享的列department_id，能够在建立Cluster时或将新的 table 添加到 Cluster时指定 Cluster Key。oracle

Cluster Key Value 指的是被指定为 Cluster Key 的列的指定的行集的值。全部包含相同Cluster Key value 的数据，其物理存储都是在一块儿的。不管有多少个表的行集包含该值，每一个cluster Key Value 在 cluster 或 cluster index 中只存储一次。优化

若是有好几个表被常常用于查询（特别是多表查询或join查询）查询，那就应该考虑使用 cluster table，由于Table Cluster将不一样表中的相关行存储在相同的block里，适当的使用 cluster table能够带来如下好处：spa

1）减小clustered tables 之间进行join查询的磁盘 I/O.net

2）减小 clustered tables之间进行join查询的读取时间code

3）更少的存储空间，由于 cluster key value 只会存储一次排序

Cluster table 在如下状况时不该被使用：ip

1）表的多数访问都是单独查询

2）表常常被 update

3）表常常须要 full table scan

4）表须要 truncating

2. Indexed Cluster 定义

Indexed Cluster 是一个使用index来定位数据的table cluster，是一个创建在Cluster Key 上的B-tree Index。Cluster Index必须在 table cluster 填入数据前创建。

以下例，以department_id 为cluster key来创建名为 employees_departments_cluster的cluster，因为定义的时候未声明为 HASHKEYS cluster，因此这个cluster是一个indexted cluster；接下来，咱们在这个cluster key上建一个名字为idx_emp_dept_cluster的index。

例：

CRATE CLUSTER employees_departments_cluster

   (department_id NUMBER(4))

SIZE 512;

 

CREATE INDEX idx_emp_dept_cluster ON CLUSTER employees_departments_cluster;

接下来在cluster上创建employees和departments两个表，同时声明departnemt_id列为cluster key
例：

CREATE TABLE employees(…)

         CLUSTER employees_departments_cluster(department_id);

 

CREATE TABLE departments(…)

CLUSTER employees_departments_cluster(department_id);

最后,在你往employees和departments里添加数据时,数据库会将employees和departments两个表中的每一个department对应的全部行存储在相同的data block里。这些行被存储在heap里并以index来进行定位。

下图展现了employees_departments_cluster的存储结构,数据库将department为20和110的employee存储到一块儿.

B-tree Cluster Index 是以存储数据的block的物理地址来关联cluster key value.例如,如下地址:
20,AADAAAA9d
表明了存储department 20中的employee的block的地址.

Cluster Index 是被单独管理的,与nonclustered table上创建的index同样,并能够与table cluster存在于不一样表空间之中.

若是employees和departments两个表没有定义为table cluster,那么数据库将不能保证这

相关联的行会被存储到一块儿.以下图所示:

3.Hash Clusters定义

Hash Cluster与indexed cluster相似,只不过index key被hash function所替换,没有单独存在的cluster index.在一个hash cluster中,数据就是index.

Hash cluster的key与indexted cluster的key同样,都是一个单一列或组合关键字段.oracle database根据特定的cluster key values，使用一个hash function来产生一系列被称为hash key 的整数.数据库将cluster key hash到一个data block的物理地址.数据库将有相同key value的行存储到一块儿.

在一个indexed table或index cluster中,oracle使用存放在一个独立index中的key value来定位数据库中的行.在indexed table或indexed cluster中查找或存储一行,最少须要通过两次I/O:

1) 至少一次I/O来在index中查找key value或在index中存储key value

2) 一次I/O来读或写table或cluster中的行

为了在hash cluster中查找或存储一行数据,oracle为每行的cluster key value提供了hash function. Oracle数据库 Hash function的计算结果对应到cluster中的data block, 并对其进行读写.

Hashing是在存储数据时用于提升数据检索速度的一种方法,当如下条件知足时,能够考虑使用hash cluster:

1) 一个表被用于query多于modify

2) hash key所在列常常被使用等于关系进行查询,如 WHERE department_id=20. 对于这个查询,若是 cluster key value已经hash,那么hash key value将直接指向存储相应行的block.

3) 一个表中的行数是能够被合理的计算出来的(用于定义 hash function)

4. Hash cluster creation

创建一个hash cluster时,除了使用 CREATE CLUSTER来建立indexed cluster外,还须要添加HASHKEY关键字,以下例:

CREATE CLUSTER employees_departments_cluster

(department_id NUMBER(4))

SIZE 8192 HASHKEYS 100;

以上, department_id被定义为hash key,在这个例子中HASHKEY声明了department有可能的数目(通常部门数都是能够计算出来的,也就是说表中的行数是能够计算出来的,知足上面的第3个条件).

在这个方案中,用户常常执行的查询以下所示,经过输入不一样的p_id来查询不一样的department ID

例:

SELECT * FROM employees

WHERE department_id = :pid

 

SELECT * FROM departments

WHERE department_id = :pid

 

SELECT * FROM departments d, employees e

WHERE e.department_id=d.department_id

AND d.department_id = :pid

假设用户常常以department_id为20来执行第一个查询, oracle数据库使用20来当成hash function的输入参数,并定位到存储全部在编号为20的department中的employee的block.

上图将一个hash cluster segment以一行blocks的形式展现出来,因而可知,每次数据检索都只须要一次I/O.

Hash Cluster的局限在于:

1) 其不适用于在nonindexed cluster key上进行range scan(区间搜索)

例:

CREATE CLUSTER employees_departments_cluster

   (department_id NUMBER(4))

SIZE 8192 HASHKEYS 100;

若是以上代码建立的hash cluster上不存在独立的index,那么查询位于20至100的department_id将不能使用hash算法,由于他不能对20至100之间的每个可能值进行hash.

由于不存在index,则数据库须要进行full scan.

5. Hash Cluster Variations

Single-table hash cluster是一个优化过的,仅支持一个table的hash cluster.在这里,从hash key 到行的映射是一一对应的,当用户须要对单表经过主键进行快速访问时,使用sing-table hash cluster是合适的.例如,用户常常会在employees表里经过employee_id查找employee相关记录.

Sorted hash cluster是hash cluster的一个变种,其内的全部与hash结果相一致的行都已根据指定的列进行升序排序. Sorted hash cluster 容许应用程序对数据进行快速检索,由于数据在插入时已经排好序.例如,一个包含orders表的hash cluster 能够根据 order_date 进行排序.

6. Hash Cluster Storage

Oracle Database对hash cluster的空间分配与index cluster是不同的.database根据建立cluster的语句里的SIZE和HASHKEYS的乘积得出一个结果.并以字节为单位预分配与此结果一致的空间.

例:

CREATE CLUSTER employees_departments_cluster

   (department_id NUMBER(4))

SIZE 8192 HASHKEYS 100;

在上例中

,HASHKEYS声明了有可能存在的department数,SIZE声明了每一个department全部数据所占的空间大小.

在一个hash cluster中,HASHKEYS的值是固定的.Oracle database并不会根据HASHKEYS来限制在table中能够插入的值,但若是插入的数据远大玩HASHKEYS的值,,对hash cluster的检索效率就会降低,这时应该使用新的HASHKEYS来重建hash cluster.