深刻理解Oracle的并行操做-转载

时间 2019-11-06

原文原文链接

转载：http://czmmiao.iteye.com/blog/1487568html

并行（Parallel）和OLAP系统
并行的实现机制是：首先，Oracle会建立一个进程用于协调并行服务进程之间的信息传递，这个协调进程将须要操做的数据集（好比表的数据块）分割成不少部分，称为并行处理单元，而后并行协调进程给每一个并行进程分配一个数据单元。好比有四个并行服务进程，他们就会同时处理各自分配的单元，当一个并行服务进程处理完毕后，协调进程就会给它们分配另外的单元，如此反复，直到表上的数据都处理完毕，最后协调进程负责将每一个小的集合合并为一个大集合做为最终的执行结果，返回给用户。
并行处理的机制实际上就是把一个要扫描的数据集分红不少小数据集，Oracle会启动几个并行服务进程同时处理这些小数据集，最后将这些结果汇总，做为最终的处理结果返回给用户。
这种数据并行处理方式在OLAP系统中很是有用，OLAP系统的表一般来讲都是很是大，若是系统的CPU比较多，让全部的CPU共同来处理这些数据，效果就会比串行执行要高得多。
然而对于OLTP系统，一般来说，并行并不合适，缘由是OLTP系统上几乎在全部的SQL操做中，数据访问路径基本上以索引访问为主，而且返回结果集很是小，这样的SQL操做的处理速度通常很是快，不须要启用并行。
并行处理的机制
当Oracle数据库启动的时候，实例会根据初始化参数 PARALLEL_MIN_SERVERS=n的值来预先分配n个并行服务进程，当一条SQL被CBO判断为须要并行执行时发出SQL的会话进程变成并行协助进程，它按照并行执行度的值来分配进程服务器进程。
首先协调进程会使用ORACLE启动时根据参数： parallel_min_servers=n的值启动相应的并行服务进程，若是启动的并行服务器进程数不足以知足并行度要求的并行服务进程数，则并行协调进程将额外启动并行服务进程以提供更多的并行服务进程来知足执行的需求。而后并行协调进程将要处理的对象划分红小数据片，分给并行服务进程处理；并行服务进程处理完毕后将结果发送给并行协调进程，而后由并行协调进程将处理结果汇总并发送给用户。
刚才讲述的是一个并行处理的基本流程。实际上，在一个并行执行的过程当中，还存在着并行服务进程之间的通讯问题。
在一个并行服务进程须要作两件事情的时候，它会再启用一个进程来配合当前的进程完成一个工做，好比这样的一条SQL语句：
Select * from employees order by last_name;
假设employees表中last_name列上没有索引，而且并行度为4，此时并行协调进程会分配4个并行服务进程对表employees进行全表扫描操做，由于须要对结果集进行排序，因此并行协调进程会额外启用4个并行服务进程，用于处理4个进程传送过来的数据，这新启用的用户处理传递过来数据的进程称为父进程，用户传出数据（最初的4个并行服务进程）称为子进程，这样整个并行处理过程就启用了8个并行服务进程。其中每一个单独的并行服务进程的行为叫做并行的内部操做，而并行服务进程之间的数据交流叫作并行的交互操做。
这也是有时咱们发现并行服务进程数量是并行度的2倍，就是由于启动了并行服务父进程操做的缘故。
读懂一个并行处理的执行计划
CREATE TABLE emp2 AS SELECT * FROM employees;
ALTER TABLE emp2 PARALLEL 2;
EXPLAIN PLAN FOR SELECT SUM(salary) FROM emp2 GROUP BY department_id;
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------
Plan hash value: 3939201228
------------------------------------------------------------------------------------------------------------------
| Id | Operation                | Name     | Rows | Bytes | Cost (%CPU)| Time     |    TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |   107 | 2782 |     3 (34)| 00:00:01 |        |      |            |
|   1 | PX COORDINATOR          |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)    | :TQ10001 |   107 | 2782 |     3 (34)| 00:00:01 | Q1,01 | P->S | QC (RAND) |
|   3 |    HASH GROUP BY         |          |   107 | 2782 |     3 (34)| 00:00:01 | Q1,01 | PCWP |            |
|   4 |     PX RECEIVE           |          |   107 | 2782 |     3 (34)| 00:00:01 | Q1,01 | PCWP |            |
|   5 |      PX SEND HASH        | :TQ10000 |   107 | 2782 |     3 (34)| 00:00:01 | Q1,00 | P->P | HASH       |
|   6 |       HASH GROUP BY      |          |   107 | 2782 |     3 (34)| 00:00:01 | Q1,00 | PCWP |            |
|   7 |        PX BLOCK ITERATOR |          |   107 | 2782 |     2   (0)| 00:00:01 | Q1,00 | PCWC |            |
|   8 |         TABLE ACCESS FULL| EMP2     |   107 | 2782 |     2   (0)| 00:00:01 | Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------
Note
-----
   - dynamic sampling used for this statement
19 rows selected.
经过执行计划，咱们来看一下它的执行步骤：
一、并行服务进程对EMP2表进行全表扫描。
二、并行服务进程以ITERATOR（迭代）方式访问数据块，也就是并行协调进程分给每一个并行服务进程一个数据片，在这个数据片上，并行服务进程顺序地访问每一个数据块（Iterator），全部的并行服务进程将扫描的数据块传给另外一组并行服务进程（父进程）用于作Hash Group操做。
三、并行服务父进程对子进程传递过来的数据作Hash Group操做。
四、并行服务进程（子进程）将处理完的数据发送出去。
五、并行服务进程（父进程）接收处处理过的数据。
六、合并处理过的数据，按照随机的顺序发给并行协调进程（QC：Query Conordinator）。
七、并行协调进程将处理结果发给用户。
当使用了并行执行，SQL的执行计划中就会多出一列：in-out。该列帮助咱们理解数据流的执行方法。它的一些值的含义以下：
Parallel to Serial（P->S）: 表示一个并行操做发送数据给一个串行操做
Parallel to Parallel（P->P）：表示一个并行操做向另外一个并行操做发送数据
Parallel Combined with parent(PCWP): 同一个从属进程执行的并行操做，同时父操做也是并行的。
Parallel Combined with Child(PCWC): 同一个从属进程执行的并行操做，子操做也是并行的。
Serial to Parallel（S->P）: 一个串行操做发送数据给并行操做，若是select部分是串行操做，就会出现这个状况。
并行执行等待事件
在作并行执行方面的性能优化的时候，可能会遇到以下等待事件
PX Deq Credit: send blkd
这是一个有并行环境的数据库中，从statspack 或者AWR中常常能够看到的等待事件。在Oracle 9i 里面，这个等待时间被列入空闲等待。
通常来讲空闲等待能够忽略它，可是实际上空闲等待也是须要关注的，由于一个空闲的等待，它反映的是另外的资源已经超负荷运行了。基于这个缘由，在Oracle 10g里已经把PX Deq Credit: send blkd等待时间再也不视为空闲等待，而是列入了Others 等待事件范围。
PX Deq Credit: send blkd 等待事件的意思是：当并行服务进程向并行协调进程QC（也多是上一层的并行服务进程）发送消息时，同一时间只有一个并行服务进程能够向上层进程发送消息，这时候若是有其余的并行服务进程也要发送消息，就只能等待了。直到得到一个发送消息的信用信息（Credit），这时候会触发这个等待事件，这个等待事件的超时时间为2秒钟。
若是咱们启动了太多的并行进程，实际上系统资源（CPU）或者QC 没法即时处理并行服务发送的数据，那么等待将不可避免。对于这种状况，咱们就须要下降并行处理的并行度。
当出现PX Deq Credit：send blkd等待的时间很长时，咱们能够经过平均等待时间来判断等待事件是否是下层的并行服务进程空闲形成的。该等待事件的超时时间是2秒，若是平均等待时间也差很少是2秒，就说明是下层的并行进程“无事所作”，处于空闲状态。若是和2秒的差距很大，就说明不是下层并行服务超时致使的空闲等待，而是并行服务之间的竞争致使的，由于这个平均等待事件很是短，说明并行服务进程在很短期的等待以后就能够获取资源来处理数据。
因此对于非下层的并行进程形成的等待，解决的方法就是下降每一个并行执行的并行度，好比对象（表，索引）上预设的并行度或者查询Hint 指定的并行度。
并行执行的使用范围
Parallel Query( 并行查询 )
并行查询能够在查询语句，子查询语句中使用，可是不可使用在一个远程引用的对象上（如DBLINK）。
一个查询可以并行执行，须要知足如下条件
一、SQL语句中有Hint提示，好比Parallel或者Parallel_index。
二、SQL语句中引用的对象被设置了并行属性。
三、多表关联中，至少有一个表执行全表扫描（Full table scan）或者跨分区的Index range SCAN。如：
select /*+parallel(t 4)×/ * from t;
Parallel DDL（并行DDL操做，如建表，建索引等）
表的并行操做
CREATE TABLE table_name parallel 4 AS SELECT ....
ALTER TABLE table_name move partition partition_name parallel 4;
Alter table table_name split partition partition_name ...... parallel 4;
Alter table table_name coalesce partition parallel 4;
DDL操做，咱们能够经过trace 文件来查看它的执行过程。
alter session set events '10046 trace name context forever,level 12';
alter session set events '10046 trace name context off' ;
建立索引的并行执行
建立索引时使用并行方式在系统资源充足的时候会使性能获得很大的提升，特别是在OLAP系统上对一些很大的表建立索引时更是如此。如下的建立和更改索引的操做均可以使用并行：
Create index index_name on table_name(colum_name) parallel 4;
Alter index index_name rebuild parallel 4
Alter index index_name rebuild partition partition_name parallel 4;
Alter index index_name split partition partition_name .... parallel 4;
注意：索引上的并行度只有在访问索引的时候才能够被使用。
总结：
使用并行方式，不管是建立表，修改表，建立索引，重建索引，他们的机制都是同样的，那就是Oracle 给每一个并行服务进程分配一块空间，每一个进程在本身的空间里处理数据，最后将处理完毕的数据汇总，完成SQL的操做。
Parallel DML（并行DML操做，如insert，update，delete等）
Oracle 能够对DML操做使用并行执行，可是有不少限制。若是咱们要让DML操做使用并行执行，必须显示地在会话里执行以下命令：
SQL> alter session enable parallel dml;
会话已更改。
只有执行了这个操做，Oracle 才会对以后符合并行条件的DML操做并行执行，若是没有这个设定，即便SQL中指定了并行执行，Oracle也会忽略它。
delete，update和merge样例以下
delete /*+ parallel(table_name 4) */ from test;
update/*+ parallel(table_name 4) */ test set id=100;
merge /*+ parallel(table_name 4) */ into table_name ...
Oracle 对Delete,update,merge的操做限制在，只有操做的对象是分区表示，Oracle才会启动并行操做。缘由在于，对于分区表，Oracle 会对每一个分区启用一个并行服务进程同时进行数据处理，这对于非分区表来讲是没有意义的。分区表的并行属性只能在表级别设置，不能在分区级别设置。
注:经笔者测试中非分区表下， Delete,update,merge也能够执行并行操做，不知道是版本缘由仍是其余缘由，待考证
Insert 的并行操做
实际上只有对于insert into … select … 这样的SQL语句启用并行才有意义。对于insert into .. values… 并行没有意义，由于这条语句自己就是一个单条记录的操做。
Insert 并行经常使用的语法是：
Insert /*+parallel(t 2) */ into t select /*+parallel(t1 2) */ * from t1;
这条SQL 语句中，可让两个操做insert 和select 分别使用并行，这两个并行是相互独立，互补干涉的，也能够单独使用其中的一个并行。也有以下语法
Insert into t select /*+parallel(t1 2) */ * from t1;
Insert /*+parallel(t 2) */ into t select * from t1;
并行执行的设定
并行相关的初始话参数
parallel_min_servers=n
在初始化参数中设置了这个值，Oracle 在启动的时候就会预先启动N个并行服务进程，当SQL执行并行操做时，并行协调进程首先根据并行度的值，在当前已经启动的并行服务中条用n个并行服务进程，当并行度大于n时，Oracle将启动额外的并行服务进程以知足并行度要求的并行服务进程数量。
parallel_max_servers=n
若是并行度的值大于parallel_min_servers或者当前可用的并行服务进程不能知足SQL的并行执行要求，Oracle将额外建立新的并行服务进程，当前实例总共启动的并行服务进程不能超过这个参数的设定值。默认值为PARALLEL_THREADS_PER_CPU * CPU_COUNT * concurrent_parallel_users * 5
parallel_adaptive_multi_user=true|false
Oracle 10g R2下，并行执行默认是启用的。这个参数的默认值为true，它让Oracle根据SQL执行时系统的负载状况，动态地调整SQL的并行度，以取得最好的SQL执行性能。算法

PARALLEL_DEGREE_POLICY
PARALLEL_DEGREE_POLICY specifies whether or not automatic degree of Parallelism, statement queuing, and in-memory parallel execution will be enabled.Values:sql

MANUAL: Disables automatic degree of parallelism, statement queuing, and in-memory parallel execution. This reverts the behavior of parallel execution to what it was prior to Oracle Database 11g Release 2 (11.2). This is the default.
LIMITED: Enables automatic degree of parallelism for some statements but statement queuing and in-memory Parallel Execution are disabled. Automatic degree of parallelism is only applied to those statements that access tables or indexes decorated explicitly with the DEFAULT degree of parallelism using the PARALLEL clause. Statements that do not access any tables or indexes decorated with the DEFAULT degree of parallelism will retain the MANUAL behavior.
AUTO: Enables automatic degree of parallelism, statement queuing, and in-memory parallel execution.

Note: If the table parallel degree has been set as "default", sql would compute its own DOP regardless parallel_degree_policy is MANUAL or LIMITED. When parallel_degree_policy=AUTO, it would trigger automatically parallel degree .
parallel_min_percent
这个参数指定并行执行时，申请并行服务进程的最小值，它是一个百分比，好比咱们设定这个值为50. 当一个SQL须要申请20个并行进程时，若是当前并行服务进程不足，按照这个参数的要求，这个SQL好比申请到20*50%=10个并行服务进程，若是不可以申请到这个数量的并行服务，SQL将报出一个ORA-12827的错误。当这个值设为Null时，表示全部的SQL在作并行执行时，至少要得到两个并行服务进程。
Note:当parallel_degree_policy=AUTO时，若是没有得到申请的进程，则该并行sql则会进入队列数据库

PARALLEL_FORCE_LOCAL
This parameter specifies whether a SQL statement executed in parallel is restricted to a single instance in an Oracle RAC environment. By setting this parameter to TRUE, you restrict the scope of the parallel server processed to the single Oracle RAC instance where the query coordinator is running.
The recommended value for the PARALLEL_FORCE_LOCAL parameter is FALSE.
PARALLEL_DEGREE_POLICY
该参数为11g的新增参数，有以下3个属性值：
manual，默认值，表示不自动调节并行度。
auto，自动调节并行度。
limited，对于在相关表或者索引上已经定义了并行度的查询进行自动并行度调节，没有在相关表或者索引上已经定义并行度的查询不进行自动并行度调节。
PARALLEL_THREADS_PER_ CPU
一个CPU 在并行执行过程当中可处理的进程或线程的数量，并优化并行自适应算法和负载均衡算法。若是计算机在执行一个典型查询时有超负荷的迹象，应减少该数值为任何非零值。根据操做系统而定 (一般为 2)
parallel_automatic_tuning
若是设置为 TRUE，Oracle 将为控制并行执行的参数肯定默认值。除了设置该参数外，你还必须为系统中的表设置并行性。以在10g中废弃，只为向下兼容保留。缓存

PARALLEL_MIN_TIME_THRESHOLD
This parameter specifies the minimum execution time a statement should have before the statement is considered for automatic degree of parallelism. By default, this is set to 10 seconds. Automatic degree of parallelism is only enabled if PARALLEL_DEGREE_POLICY is set to AUTO or LIMITED. The syntax is:
PARALLEL_MIN_TIME_THRESHOLD = { AUTO | integer }
The default is AUTO.
PARALLEL_SERVERS_TARGET
This parameter specifies the number of parallel server processes allowed to run parallel statements before statement queuing is used. The default value is:
PARALLEL_THREADS_PER_CPU * CPU_COUNT * concurrent_parallel_users * 2
In the formula, the value assigned to concurrent_parallel_users running at the default degree of parallelism on an instance is dependent on the memory management setting. If automatic memory management is disabled (manual mode), then the value of concurrent_parallel_users is 1. If PGA automatic memory management is enabled, then the value of concurrent_parallel_users is 2. If global memory management or SGA memory target is used in addition to PGA automatic memory management, then the value of concurrent_parallel_users is 4.
When PARALLEL_DEGREE_POLICY is set to AUTO, statements that require parallel execution are queued if the number of parallel processes currently in use on the system equals or is greater than PARALLEL_SERVERS_TARGET. This is not the maximum number of parallel server processes allowed on a system (that is controlled by PARALLEL_MAX_SERVERS). However, PARALLEL_SERVERS_TARGET and parallel statement queuing is used to ensure that each statement that requires parallel execution is allocated the necessary parallel server resources and the system is not flooded with too many parallel server processes.
parallel_execution_message_size
指定并行执行 (并行查询、PDML、并行恢复和复制) 消息的大小。若是值大于 2048 或 4096，就须要更大的共享池。若是 PARALLEL_AUTOMATIC_TUNING =TRUE，将在大存储池以外指定消息缓冲区。若是PARALLEL_AUTOMATIC_TUNING 为 FALSE，一般值为 2148；若是PARALLEL_AUTOMATIC_TUNING 为 TRUE ，则值为 4096 (根据操做系统而定)。性能优化

PARALLEL_DEGREE_LIMIT
With automatic degree of parallelism, Oracle automatically decides whether or not a statement should execute in parallel and what degree of parallelism the statement should use. The optimizer automatically determines the degree of parallelism for a statement based on the resource requirements of the statement. However, the optimizer will limit the degree of parallelism used to ensure parallel server processes do not flood the system. This limit is enforced by PARALLEL_DEGREE_LIMIT
Values:服务器

CPU: The maximum degree of parallelism is limited by the number of CPUs in the system. The formula used to calculate the limit is PARALLEL_THREADS_PER_CPU * CPU_COUNT * the number of instances available (by default, all the opened instances on the cluster but can be constrained using PARALLEL_INSTANCE_GROUP or service specification). This is the default.
IO: The maximum degree of parallelism the optimizer can use is limited by the I/O capacity of the system. The value is calculated by dividing the total system throughput by the maximum I/O bandwidth per process. You must run the DBMS_RESOURCE_MANAGER.CALIBRATE_IO procedure on the system in order to use the IO setting. This procedure will calculate the total system throughput and the maximum I/O bandwidth per process.
integer: A numeric value for this parameter specifies the maximum degree of parallelism the optimizer can choose for a SQL statement when automatic degree of parallelism is active. Automatic degree of parallelism is only enabled if PARALLEL_DEGREE_POLICY is set to AUTO or LIMITED.

并行度能够经过如下三种方式来设定：
一、使用Hint 指定并行度。
二、使用alter session force parallel 设定并行度。
三、使用SQL中引用的表或者索引上设定的并行度，原则上Oracle 使用这些对象中并行度最高的那个值做为当前执行的并行度。
示例：
SQL>Select /*+parallel(t 4) */ count(*) from t;
SQL>Alter table t parallel 4;
SQL>Alter session force parallel query parallel 4;
SQL>alter session force parallel query；
Oracle 默认并行度计算方式：
一、Oracle 根据CPU的个数，RAC实例的个数以及参数parallel_threads_per_cpu的值，计算出一个并行度。
二、对于并行访问分区操做，取须要访问的分区数为并行度。
并行度的优先级别从高到低：
Hint->alter session force parallel->表，索引上的设定-> 系统参数
实际上，并行只有才系统资源比较充足的状况下，才会取得很好的性能，若是系统负担很重，不恰当的设置并行，反而会使性能大幅降低。
如何启用并行session

能够用hint、alter session或者设置对象并行属性三种方式设置启用并行。三种方式任意一种就可使并行生效，若是多种方式同时存在的话，则优先级顺序是：hint -> alter session -> table/index degree。并发

alter session force query parallel 8;oracle

alter session enable parallel dml;

alter session force parallel dml parallel 8;

alter session force parallel ddl parallel 8;

注意：上述的alter session enable只是表示让当前会话支持并行，最终并行须要经过hint或者table/index degree来实现；而alter session force表示强制并行，无需hint等配合使用。

11g中的并行
在Oracle 11g R2以及以前的版本，你的SQL执行可能被延迟，直到有充足的并行服务器可用。

Oracle有一种轮换IO机制，叫作“直接路径IO”，若是它判断到绕过缓存区直接执行IO会更快速的话，它就会启用。例如，Oracle在读写临时段进行排序或者整理中间结果集时就会使用直接IO。从Oracle 11g开始，Oracle有时也优先利用直接路径IO来处理串行表访问，而不是正常的缓存IO。

在执行并行查询操做时，Oracle一般会使用直接路径IO。经过使用直接路径IO，Oracle能够避免建立高速缓存竞争，并可使IO更加优化地在并行进程之间分配。此外，对于执行全表扫描的并行操做，在高速缓存找到匹配数据的机会至关低，所以高速缓存几乎没有增长什么价值。

在Oracle 10g以及更早的版本，并行查询老是使用直接路径IO，而串行查询将老是使用缓存IO。在11g中，Oracle能够对并行查询利用缓存IO(从11g R2之后的版本支持)，并且串行查询也可能利用直接路径IO。然而，并行查询仍然不太可能利用缓存IO，所以，可能比串行查询须要更高的IO成本。固然，更高的IO成本将在全部并行进程之间共享，这样总体性能仍可能更胜一筹。

并行的优化准则
从优化串行执行的SQL开始
一个最理想的并行计划与最佳的串行计划多是有区别的。例如，并行处理一般从表或索引的扫描开始，而最佳串行计划多是基于索引查找开始。然而，你应该确保你的查询在进行并行优化以前先对串行执行进行优化，缘由以下：
一、串行调试的结构和方法主要针对索引和统计集合，而这些常常对好的并行调试也很是关键。
二、若是请求并行执行的资源不可用，你的查询可能被串行化(这取决于“PARALLEL_DEGREE_ POLICY”和“PARALLEL_MIN_PERCENT”的设置)。在这种状况下，你要确保你并行查询的串行计划足够好。
缺乏调优的SQL甚至可能变成更差的SQL，至少考虑到对其余用户的影响时是这样，这使它被容许消耗数据库服务器更多的CPU和IO资源。
在为并行执行优化SQL语句时，要从未串行执行SQL优化开始。
确保该SQL是合适的并行执行SQL
不是每一个SQL都能从并行执行中获益的。下面是一些例子，这些状况的SQL语句可能不该该被并行化。
一、串行执行时，执行时间很短的SQL语句。
二、可能在多个会话中高并发率运行的SQL语句。
三、基于索引查找的SQL语句。非并行的索引查找或者范围扫描不能被并行化。然而，索引全扫描能够被并行化。在分区索引上的索引查找也能够被并行化。
综上3点，OLTP类型的查询一般不适合并行化处理。
确保系统适合配置为并行执行
不是全部的SQL都适合并行执行，也不是全部的数据库服务器主机适合配置并行处理。在当今世界，大部分物理服务器主机都知足以下最小需求：多块CPU和跨多个物理驱动器的数据带。然而，一些虚拟主机可能不知足这些最小需求，而桌面计算机一般只有惟一的磁盘设备，所以一般不适合调整为并行执行。
不要尝试在那些不知足最小需求(多块CPU和跨多个磁盘驱动器的数据带)的计算机系统上使用并行执行。
确保执行计划的全部部分都被并行化了
在复杂的并行SQL语句中，很重要的一点是要确保该查询执行的全部重要步骤都实现了并行。若是某复杂查询的其中一个步骤是串行执行的，其余并行步骤可能也不得不等待该串行步骤完成，这样并行机制的优点就彻底丧失了。“PLAN_TABLE”表中的“OTHER_TAG”列用 “PARALLEL_FROM_SERIAL”标记指定了这样一个步骤，“DBMS_XPLAN”在“IN-OUT”列中记录了“S->P”。例如：在下面的例子中表“T1”是并行化的，可是表“T”不是。对两个表的链接和“GROUP BY”包括许多并行操做，可是对“T”表的全表扫描不是并行化的，串到并(S->P)标记展现了“t”行被串行提取到后续并行操做中：
SQL> ALTER TABLE customers PARALLEL(DEGREE 4);
SQL> ALTER TABLE sales NOPARALLEL ;
SQL> explain plan for select /*+ ordered use_hash(t1) */ t.name,sum(t.id) from t,t1 where t.id=t1.object_id group by t.name;
SQL> 1* select * from table(dbms_xplan.display)
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2575143521
----------------------------------------------------------------------------------------------------------------------
| Id | Operation                    | Name     | Rows | Bytes | Cost (%CPU)| Time     |    TQ |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |    61 | 4758 |    22 (10)| 00:00:01 |        |      |            |
|   1 | PX COORDINATOR              |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)        | :TQ10002 |    61 | 4758 |    22 (10)| 00:00:01 | Q1,02 | P->S | QC (RAND) |
|   3 |    HASH GROUP BY             |          |    61 | 4758 |    22 (10)| 00:00:01 | Q1,02 | PCWP |            |
|   4 |     PX RECEIVE               |          |    61 | 4758 |    22 (10)| 00:00:01 | Q1,02 | PCWP |            |
|   5 |      PX SEND HASH            | :TQ10001 |    61 | 4758 |    22 (10)| 00:00:01 | Q1,01 | P->P | HASH       |

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   6 |       HASH GROUP BY          |          |    61 | 4758 |    22 (10)| 00:00:01 | Q1,01 | PCWP |            |
|* 7 |        HASH JOIN             |          |    61 | 4758 |    21   (5)| 00:00:01 | Q1,01 | PCWP |            |
|   8 |         BUFFER SORT          |          |       |       |            |          | Q1,01 | PCWC |            |
|   9 |          PX RECEIVE          |          |    61 | 3965 |    12   (0)| 00:00:01 | Q1,01 | PCWP |            |
| 10 |           PX SEND BROADCAST | :TQ10000 |    61 | 3965 |    12   (0)| 00:00:01 |        | S->P | BROADCAST |
| 11 |            TABLE ACCESS FULL | T        |    61 | 3965 |    12   (0)| 00:00:01 |        |      |            |
| 12 |         PX BLOCK ITERATOR    |          | 52078 |   661K|     8   (0)| 00:00:01 | Q1,01 | PCWC |            |
| 13 |          INDEX FAST FULL SCAN| T1_IDX   | 52078 |   661K|     8   (0)| 00:00:01 | Q1,01 | PCWP |            |
----------------------------------------------------------------------------------------------------------------------
像前面这种状况，部分并行化执行计划可能会致使两方面效果都不好：消耗的时间并无改善，由于串行操做造成了整个执行的瓶颈。然而，该SQL还捆绑了并行服务器进程，并且可能影响其余并发执行SQL的性能。
若是咱们为表“t”设置一个默认的并行度，该串行瓶颈将会消失。对“t”表的全扫描如今是按并行执行了，并且“串到并S->P”瓶颈将被全并行的“并到并P->P”操做替代
在优化并行执行计划时，要确保全部相关步骤都在并行执行：“DBMS_XPLAN”中的串到并S->P 标记或者“PLAN_TABLE”中的“PARALLEL_FROM_SERIAL”一般指示在并行计划的某些方面存在串行瓶颈。
Monitor parallel execution
SQL> select * from v$pq_sesstat;
STATISTIC                      LAST_QUERY SESSION_TOTAL
------------------------------ ---------- -------------
Queries Parallelized                    1            13
DML Parallelized                        0             0
DDL Parallelized                        0             0
DFO Trees                               1            13
Server Threads                          6             0
Allocation Height                       3             0
Allocation Width                        1             0
Local Msgs Sent                       362        303740
Distr Msgs Sent                         0             0
Local Msgs Recv'd                     368        303782
Distr Msgs Recv'd                       0             0
11 rows selected.
SQL> select DFO_NUMBER, TQ_ID, SERVER_TYPE, NUM_ROWS ,BYTES,process from v$pq_tqstat order by dfo_number , tq_id , server_type;
DFO_NUMBER      TQ_ID SERVER_TYPE       NUM_ROWS      BYTES PROCESS
---------- ---------- --------------- ---------- ---------- --------
         1          0 Consumer             17151     100454 P001
         1          0 Consumer             17242     100969 P002
         1          0 Consumer             17257     101058 P000
         1          0 Producer              1971       9955 P004
         1          0 Producer             29565     174989 P005
         1          0 Producer             20114     117537 P003
         1          1 Consumer               893       8107 P002
         1          1 Consumer              2914      26341 P001
         1          1 Consumer                 0         60 P000
         1          1 Producer               611       5494 P003
         1          1 Producer              2593      23493 P005
         1          1 Producer               603       5521 P004
         1          2 Consumer              3807      19040 QC
         1          2 Producer              2914      14545 P001
         1          2 Producer               893       4475 P002
         1          2 Producer                 0         20 P000
16 rows selected.
Obviously, from the v$pq_tqstat, the DOP is downgrade to 3, and have two slave sets. From the v$tq_sesstat, we can see that what’s the process. How these processes cooperated together to generated the result. In this sample, P003, P004,P005 generate the data and feed to P000,P001,P002, totally twice. Then P000,P001,P002 feed the data to QC, the query coordinator. Also, we can find that P000 at the first time received 17257 rows, but at the second time it received 0 rows, and lastly, it product 0 rows to QC. Why?
Let’s look at the execution plan for further investigation.
SQL> @utlxplp
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1177066807
-----------------------------------------------------------------------------------------------------------------
| Id | Operation               | Name     | Rows | Bytes | Cost (%CPU)| Time     |    TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          | 3867 |   147K|    13   (8)| 00:00:01 |        |      |            |
|   1 | PX COORDINATOR         |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10002 | 3867 |   147K|    13   (8)| 00:00:01 | Q1,02 | P->S | QC (RAND) |
|* 3 |    HASH JOIN BUFFERED   |          | 3867 |   147K|    13   (8)| 00:00:01 | Q1,02 | PCWP |            |
|   4 |     PX RECEIVE          |          | 53332 |   677K|     7   (0)| 00:00:01 | Q1,02 | PCWP |            |
|   5 |      PX SEND HASH       | :TQ10000 | 53332 |   677K|     7   (0)| 00:00:01 | Q1,00 | P->P | HASH       |
|   6 |       PX BLOCK ITERATOR |          | 53332 |   677K|     7   (0)| 00:00:01 | Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS FULL| PRODUCTS | 53332 |   677K|     7   (0)| 00:00:01 | Q1,00 | PCWP |            |
|   8 |     PX RECEIVE          |          | 3867 |    98K|     5   (0)| 00:00:01 | Q1,02 | PCWP |            |
|   9 |      PX SEND HASH       | :TQ10001 | 3867 |    98K|     5   (0)| 00:00:01 | Q1,01 | P->P | HASH       |
| 10 |       PX BLOCK ITERATOR |          | 3867 |    98K|     5   (0)| 00:00:01 | Q1,01 | PCWC |            |
| 11 |        TABLE ACCESS FULL| COSTS    | 3867 |    98K|     5   (0)| 00:00:01 | Q1,01 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("C"."PROD_ID"="P"."PROD_ID")
This is a hash-hash parallel distribute, using hash joining. firstly, Slave set(Q1,00) scan products table and send the results to slave set(Q1,02), and slave set(Q1,01) scan the costs table and send results to slave set(Q1,02), which slave in slave set(Q1,02) to send to is dependent on the hash value of join key. as the table costs only have two distinct values for prod_id, so the rows in costs only have two target slave to send to, resulting one slave of the slave set (Q1,02) receive 0 rows. After the slaves in (Q1,02) received rows from the other two slave sets, they hash join the rows and send the result to QC, obviously, one slave will have no rows to process.
This is why 0 rows in the above output.
Also, you can enable the trace for parallel execution, through this way:
alter session set events '10132 trace name context forever';

Compute the variance of bytes grouped by TQ_ID. Large variances indicate workload imbalances. You should investigate large variances to determine whether the producers start out with unequal distributions of data, or whether the distribution itself is skewed. If the data itself is skewed, this might indicate a low cardinality, or low number of distinct values.

Some Common Sense

One of my favourite descriptions of performance tuning, althou gh I can’t remember where I first heard it, is that it is based on ‘informed common sense’. That really captures my ow n experiences of performance tu ning. Yes, you need to use proper analysis techniques and often a great deal of technical knowledge, but that’s all devalued if you’re completely missing the point . So let’s take a step away from the technical and consider the big picture.
Don’t even think about implementing Parallel Execution un less you are prepared to invest some time in initial testing, followed by ongoing performance monitoring. If you don’t, you might one day hit performance problems either server-wide or on an individual user session that you’d never believe (until it happens to you).
Parallel Execution is designed to utilise hardware as heavily as possible. If you are running on a single-CPU server with two hard disk drives and 512Mb RAM, don’t expect significant perfor mance improvements just because you switch PX on. The more CPUs , disk drives, controllers and RAM you have installed on your server, the better the results are going to be.
Although you may be able to use Parallel Execution to make an inefficient SQL stat ement run many times faster, that would be incredibly stupid. It’s essential that you tune the SQL first . In the end, doing more work than you should be, but more quickly, is still doing more work than you should be! To put it another way, don’t use PX as a dressing for a poorly designed application. Reduce the wo rkload to the minimum needed to achieve the task and then start using the server facilities to make it run as quickly as possible. Seems obvious, doesn’t it?
If you try to use PX to benefit a large number of user s performing online queries yo u may eventually bring the server to its knees. Well, maybe not if you use th e Adaptive Multi-User algorithm, but then it’s essential that both you and, more important, your users unders tand that response time is going to be very variable when the machine gets busy.
Using PX for a query that runs in a few seconds is pointless. You’re just going to use more resources on the server for very little improvemen t in the run time of the query. It might well run more slowly!
Sometimes when faced with a slow i/o subsystem you migh t find that higher degrees of parallelism are useful because the CPUs are spending more ti me waiting for i/o to complete. Ther efore they are more likely to be available for another PX slave (that isn’t waiting on i/o) to use. This was certainly my experience at one site. However, it’s also true that using PX will usually lead to a busier i/o subsystem because the server is likely to favour full scans over indexed retrieva l. There are no easy answers here - you really need to carry out some analysis of overall system resource usage to identify where the bottlenecks are an d adjust the configuration accordingly.
Consider whether PX is the correct parallel solution for overnight batch operations. It may be that you can achieve better performance using multip le streams of jobs, each single-threa ded, or maybe you would be better with one stream of jobs which uses PX. It depends on your application so the only sure way to find out is to try the different approaches .

BTW, introduce two useful link about parallel execution.

Oracle Database Parallel Execution Fundamental

http://www.oracle.com/technetwork/articles/datawarehouse/twp-parallel-execution-fundamentals-133639.pdf

How Parallel Execution Works

http://docs.oracle.com/cd/E11882_01/server.112/e25523/parallel002.htm#VLDBG0101

参考至:《让Oracle跑得更快》谭怀远著
           《让Oracle跑得更快2》谭怀远著
           http://howe.im/%E6%95%B0%E6%8D%AE%E5%BA%93/137.html
           http://www.cnblogs.com/daduxiong/archive/2010/08/24/1807427.html
           http://www.searchdatabase.com.cn/showcontent_44070.htm
           http://xuguohao.gotoip55.com/?p=142
           http://blog.csdn.net/tianlesoftware/article/details/5854583
           http://docs.oracle.com/cd/E11882_01/server.112/e17110/initparams176.htm
           http://space.itpub.net/17203031/viewspace-696389
           http://os2ora.com/monitor-parallel-execution/
           http://oracledoug.com/px.pdf
           http://docs.oracle.com/cd/E11882_01/server.112/e25523/parallel006.htm#VLDBG1513

http://www.oracle.com/technetwork/articles/datawarehouse/twp-parallel-execution-fundamentals-133639.pdf http://docs.oracle.com/cd/E11882_01/server.112/e25523/parallel005.htm#CIHHGECI http://docs.oracle.com/cd/E11882_01/server.112/e40402/initparams181.htm#REFRN10310 http://www.cnblogs.com/luoyx/archive/2011/12/26/2302587.html本文原创，转载请注明出处、做者若有错误，欢迎指正邮箱:czmcj@163.com