大表关联走hash?
案例:sql
---- 反正我执行过1个多小时,没有跑完ide
SELECT
a.id AS order_id ,b.s_id AS bill_id,
d.id AS sub_order_id,
d.deal_oper_id
FROM EM_ORDER PARTITION(EM_ORDER_201611) A,
M_101_ID_2_GID B,
ER_ORDER_ORDER C,
EM_ORDER D,
EE_ORDER_PF_WORK E
WHERE A.SPEC_ID = 3010200004
AND A.ID = B.T_ID
AND A.STATUS_ID = 1000007
AND A.COMPLETE_TIME >= TO_DATE('2016-11-14 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
and A.COMPLETE_TIME <= TO_DATE('2016-11-14 23:59:59', 'YYYY-MM-DD HH24:MI:SS')
AND A.ID = C.A_ORDER_ID
AND C.B_ORDER_ID = D.ID
AND D.ID = E.ORDER_ID
AND e.work_type_id = 1001411
AND ( d.deal_oper_id IS NULL
OR (SELECT f_chk_idcard(x.identity_number)
FROM dm_staff x
WHERE x.id = d.deal_oper_id) = 0
);性能
各表表大小:如下这些表几乎都是分区表。 惋惜的是分区表作的不是很合理, 都是没有用到分区裁剪功能,分区全扫描。
优化size_mb segement_name code
4595.0625 EE_ORDER_PF_WORK
40159.0625 EM_ORDER
20059.0625 ER_ORDER_ORDER
20770.0625 M_101_ID_2_GID
dm_staff 小表, 20万条数据。 orm
执行计划:索引
1 Plan hash value: 309883988 2 3 -------------------------------------------------------------------------------------------------------------------------------------------- 4 | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop | 5 -------------------------------------------------------------------------------------------------------------------------------------------- 6 | 0 | SELECT STATEMENT | | 1 | 130 | | 1218K (1)| 04:03:44 | | | 7 |* 1 | FILTER | | | | | | | | | 8 | 2 | NESTED LOOPS | | 21005 | 2666K| | 1166K (1)| 03:53:19 | | | 9 | 3 | NESTED LOOPS | | 21791 | 2666K| | 1166K (1)| 03:53:19 | | | 10|* 4 | HASH JOIN | | 21791 | 2234K| 17M| 1079K (1)| 03:35:53 | | | 11| 5 | NESTED LOOPS | | 183K| 15M| | 907K (1)| 03:01:32 | | | 12| 6 | NESTED LOOPS | | 183K| 15M| | 907K (1)| 03:01:32 | | | 13| 7 | NESTED LOOPS | | 183K| 11M| | 358K (1)| 01:11:37 | | | 14|* 8 | TABLE ACCESS BY GLOBAL INDEX ROWID| EM_ORDER | 106K| 3631K| | 39284 (1)| 00:07:52 | 8 | 8 | 15|* 9 | INDEX RANGE SCAN | IDX_EM_ORDER_COMP_TIME | 44669 | | | 330 (0)| 00:00:04 | | | 16|* 10 | INDEX RANGE SCAN | IDX_ER_ORDER_ORDER_OO | 2 | 56 | | 3 (0)| 00:00:01 | | | 17|* 11 | INDEX UNIQUE SCAN | PK_EM_ORDER | 1 | | | 2 (0)| 00:00:01 | | | 18| 12 | TABLE ACCESS BY GLOBAL INDEX ROWID | EM_ORDER | 1 | 26 | | 3 (0)| 00:00:01 | ROWID | ROWID | 19| 13 | PARTITION RANGE ALL | | 10M| 161M| | 156K (1)| 00:31:22 | 1 | 10 | 20|* 14 | TABLE ACCESS FULL | EE_ORDER_PF_WORK | 10M| 161M| | 156K (1)| 00:31:22 | 1 | 10 | 21|* 15 | INDEX RANGE SCAN | IDX_101_T_ID | 1 | | | 3 (0)| 00:00:01 | | | 22| 16 | TABLE ACCESS BY GLOBAL INDEX ROWID | M_101_ID_2_GID | 1 | 25 | | 4 (0)| 00:00:01 | ROWID | ROWID | 23| 17 | TABLE ACCESS BY INDEX ROWID | DM_STAFF | 1 | 20 | | 3 (0)| 00:00:01 | | | 24|* 18 | INDEX UNIQUE SCAN | PK_DM_STAFF | 1 | | | 2 (0)| 00:00:01 | | | 25-------------------------------------------------------------------------------------------------------------------------------------------- 26 27 Predicate Information (identified by operation id): 28 --------------------------------------------------- 29 30 1 - filter( (SELECT "F_CHK_IDCARD"("X"."IDENTITY_NUMBER") FROM "QWZW_ER"."DM_STAFF" "X" WHERE "X"."ID"=:B1)=0) 31 4 - access("D"."ID"="E"."ORDER_ID") 32 8 - filter("A"."SPEC_ID"=3010200004 AND "A"."STATUS_ID"=1000007) 33 9 - access("A"."COMPLETE_TIME">=TO_DATE(' 2016-11-16 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "A"."COMPLETE_TIME"<=TO_DATE(' 34 2016-11-16 23:59:59', 'syyyy-mm-dd hh24:mi:ss')) 35 filter(TBL$OR$IDX$PART$NUM("QWZW_ER"."EM_ORDER",0,1,0,ROWID)=8) 36 10 - access("A"."ID"="C"."A_ORDER_ID") 37 11 - access("C"."B_ORDER_ID"="D"."ID") 38 14 - filter("E"."WORK_TYPE_ID"=1001411) 39 15 - access("A"."ID"="B"."T_ID") 40 filter("B"."T_ID" IS NOT NULL) 41 18 - access("X"."ID"=:B1)
--------------------------------------------------------------------------------------------------------------------------------------------
问题分析:
1,看执行计划:
第一眼看见这个sql我会认为id=7到id=9有问题,id=14有问题。hash
id=9,表40G,虽然一个分区可是全局索引,索引很大的;大表回表再过滤是有问题的,看9的过滤条件,
查询的是一天的数据,这个表是按月分区的,存储一年的数据,
40/365=0.1G,也就是说天天0.1g,因此这里问题不大。
但仍是建本地索引速度会更快一点。建的索引不合理,何况这个影响不是很大,暂且先忽略。it
看id=14,E表,4G的表,全表扫描很慢, 不少人想到能不能建索引? 答案是不能! "E"."WORK_TYPE_ID"=1001411数据不少,建索引也不走,就算走了索引估计还没全表快。
id=14和id=15走hash,有问题,E表4g的表,全表扫描 , 确定须要大量时间。 io
我第一眼看到id=1 的地方 filter,谓词部分 后面有一个小表的查询, 这个地方确定有性能问题的,因而在整个SQL优化完后, 我把这个filter去掉,竟然发现查询时间差很少
因而我就没有管它。
2,看sql:这是5个表关联,看where条件部分,E表4g比较大,而过滤条件只有一个 e.work_type_id = 1001411,
你看这个SQL发现 E表的数据再也不 select 后面的列出现,能够改为半链接的,并且半链接的效率要高一点。 因而我改为exists
改写为
exists( select 1 from EE_ORDER_PF_WORK E where D.ID = E.ORDER_ID AND e.work_type_id = 1001411 ) .
其实更多的时候我会改为in。
3 加hint 使执行计划整体走NL,这里让小表做为驱动表,走NL,一路驱动下去。为何select这里要全走NL?由于看上面的执行计划已经分析
过了id=5到id=12虽然都是大表, 可是走的都是索引, 结合A表时间条件, 以及其余条件, 驱动表的数据量不是不少。 而且最终数据量也不是不少。
改SQL:为
SELECT /*+ use_nl(a,b) use_nl(a,c) use_nl(c,d) leading(a) */
'开通单' AS spec_name,
a.id AS order_id ,b.s_id AS bill_id,
d.id AS sub_order_id,
---x.s_id task_id,
d.deal_oper_idFROM EM_ORDER PARTITION(EM_ORDER_201611) A,
M_101_ID_2_GID B,
ER_ORDER_ORDER C,
EM_ORDER D
WHERE A.SPEC_ID = 3010200004
AND A.ID = B.T_ID
AND A.STATUS_ID = 1000007
AND A.COMPLETE_TIME >= TO_DATE('2016-11-18 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
and A.COMPLETE_TIME <= TO_DATE('2016-11-18 23:59:59', 'YYYY-MM-DD HH24:MI:SS')
AND A.ID = C.A_ORDER_ID
AND C.B_ORDER_ID = D.ID
AND exists( select /*+ nl_sj */ 1 from EE_ORDER_PF_WORK E where D.ID = E.ORDER_ID AND e.work_type_id = 1001411 )
AND ( d.deal_oper_id IS NULL
OR (SELECT f_chk_idcard(x.identity_number)
FROM dm_staff x
WHERE x.id = d.deal_oper_id) = 0
) ;
4,改写以后的执行计划:此时id=14已经走了NESTED LOOPS SEMI,而id=13已经自动走了索引。
1 Plan hash value: 1576200999 2 3 ----------------------------------------------------------------------------------------------------------------------------------- 4 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | 5 ----------------------------------------------------------------------------------------------------------------------------------- 6 | 0 | SELECT STATEMENT | | 3537 | 449K| 1372K (1)| 04:34:26 | | | 7 |* 1 | FILTER | | | | | | | | 8 | 2 | NESTED LOOPS | | 19483 | 2473K| 1372K (1)| 04:34:26 | | | 9 | 3 | NESTED LOOPS | | 20212 | 2473K| 1372K (1)| 04:34:26 | | | 10 | 4 | NESTED LOOPS SEMI | | 20212 | 2072K| 1291K (1)| 04:18:16 | | | 11 | 5 | NESTED LOOPS | | 169K| 14M| 844K (1)| 02:48:57 | | | 12 | 6 | NESTED LOOPS | | 169K| 10M| 334K (1)| 01:07:00 | | | 13 |* 7 | TABLE ACCESS BY GLOBAL INDEX ROWID| EM_ORDER | 98540 | 3368K| 39284 (1)| 00:07:52 | 8 | 8 | 14 |* 8 | INDEX RANGE SCAN | IDX_EM_ORDER_COMP_TIME | 44669 | | 330 (0)| 00:00:04 | | | 15 |* 9 | INDEX RANGE SCAN | IDX_ER_ORDER_ORDER_OO | 2 | 56 | 3 (0)| 00:00:01 | | | 16 | 10 | TABLE ACCESS BY GLOBAL INDEX ROWID | EM_ORDER | 1 | 26 | 3 (0)| 00:00:01 | ROWID | ROWID | 17 |* 11 | INDEX UNIQUE SCAN | PK_EM_ORDER | 1 | | 2 (0)| 00:00:01 | | | 18 |* 12 | TABLE ACCESS BY GLOBAL INDEX ROWID | EE_ORDER_PF_WORK | 1257K| 19M| 3 (0)| 00:00:01 | ROWID | ROWID | 19 |* 13 | INDEX UNIQUE SCAN | PK_EE_ORDER_PF_WORK | 1 | | 2 (0)| 00:00:01 | | | 20 |* 14 | INDEX RANGE SCAN | IDX_101_T_ID | 1 | | 3 (0)| 00:00:01 | | | 21 | 15 | TABLE ACCESS BY GLOBAL INDEX ROWID | M_101_ID_2_GID | 1 | 25 | 4 (0)| 00:00:01 | ROWID | ROWID | 22 | 16 | TABLE ACCESS BY INDEX ROWID | DM_STAFF | 1 | 20 | 3 (0)| 00:00:01 | | | 23 |* 17 | INDEX UNIQUE SCAN | PK_DM_STAFF | 1 | | 2 (0)| 00:00:01 | | | 24 ----------------------------------------------------------------------------------------------------------------------------------- 25 26 Predicate Information (identified by operation id): 27 --------------------------------------------------- 28 29 1 - filter("D"."DEAL_OPER_ID" IS NULL OR (SELECT "F_CHK_IDCARD"("X"."IDENTITY_NUMBER") FROM "QWZW_ER"."DM_STAFF" "X" 30 WHERE "X"."ID"=:B1)=0) 31 7 - filter("A"."SPEC_ID"=3010200004 AND "A"."STATUS_ID"=1000007) 32 8 - access("A"."COMPLETE_TIME">=TO_DATE(' 2016-11-18 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND 33 "A"."COMPLETE_TIME"<=TO_DATE(' 2016-11-18 23:59:59', 'syyyy-mm-dd hh24:mi:ss')) 34 filter(TBL$OR$IDX$PART$NUM("QWZW_ER"."EM_ORDER",0,1,0,ROWID)=8) 35 9 - access("A"."ID"="C"."A_ORDER_ID") 36 11 - access("C"."B_ORDER_ID"="D"."ID") 37 12 - filter("E"."WORK_TYPE_ID"=1001411) 38 13 - access("D"."ID"="E"."ORDER_ID") 39 14 - access("A"."ID"="B"."T_ID") 40 filter("B"."T_ID" IS NOT NULL) 41 17 - access("X"."ID"=:B1)
5,优化完以后, 首次执行大概在 4分钟吧, 再次执行只须要 47S 左右。对比以前的执行计划,反正木有发如今47S出结果的。 结果大概3千多条数据。 总结:大表很大表关联,常理思路是走hash, 可是此处场景,驱动表经过过滤以后返回几千条数据,被驱动表走主键惟一扫描,走NL效率更高。