[Oracle]Oracle数据库CPU利用率很高解决方案

Oracle数据库常常会遇到CPU利用率很高的状况,这种时候大都是数据库中存在着严重性能低下的SQL语句,这种SQL语句大大的消耗了CPU资源,致使整个系统性能低下。固然,引发严重性能低下的SQL语句的缘由是多方面的,具体的缘由要具体的来分析,下面经过一个实际的案例来讲明如何来诊断和解决CPU利用率高的这类问题。   数据库:Oracle9.2.0.4   问题描述:现场工程师汇报数据库很是慢,几乎全部应用操做均没法正常进行。   首先登录主机,执行top发现CPU资源几乎消耗殆尽,存在不少占用CPU很高的进程,而内存和I/O都不高,具体以下:   last pid: 26136; load averages: 8.89, 8.91, 8.12   216 processes: 204 sleeping, 8 running, 4 on cpu   CPU states: 0.6% idle, 97.3% user, 1.8% kernel, 0.2% iowait, 0.0% swap   Memory: 8192M real, 1166M free, 14M swap in use, 8179M swap free   PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND   25725 oracle 1 50 0 4550M 4508M cpu2 12:23 11.23% oracle   25774 oracle 1 41 0 4550M 4508M run 14:25 10.66% oracle   26016 oracle 1 31 0 4550M 4508M run 5:41 10.37% oracle   26010 oracle 1 41 0 4550M 4508M run 4:40 9.81% oracle   26014 oracle 1 51 0 4550M 4506M cpu6 4:19 9.76% oracle   25873 oracle 1 41 0 4550M 4508M run 12:10 9.45% oracle   25723 oracle 1 50 0 4550M 4508M run 15:09 9.40% oracle   26121 oracle 1 41 0 4550M 4506M cpu0 1:13 9.28% oracle   因而先查看数据库的告警日志ALERT文件,并无发现有什么错误存在,日志显示数据库运行正常,排除数据库自己存在问题。   而后查看这些占用CPU资源很高的Oracle进程到底是在作什么操做,使用以下SQL语句:   select sql_text,spid,v$session.program,process from   v$sqlarea,v$session,v$process   where v$sqlarea.address=v$session.sql_address   and v$sqlarea.hashvalue=v$session.sql_hash_value   and v$session.paddr=v$process.addr   and v$process.spid in (PID);   用top中占用CPU很高的进程的PID替换脚本中的PID,获得相应的Oracle进程所执行的SQL语句,发现占用CPU资源很高的进程都是执行同一个SQL语句:   SELECT d.domainname,d.mswitchdomainid, a.SERVICEID,a.SERVICECODE,a.USERTYPE,a.STATUS,a.NOTIFYSTATUS,to_char(a.DATECREATED,'yyyy-mm-dd hh24:mi:ss') DATECREATED,VIPFLAG,STATUS2,CUSTOMERTYPE,CUSTOMERID FROM service a, gatewayloc b, subbureaunumber c, mswitchdomain d WHERE b.mswitchdomainid = d.mswitchdomainid and b.gatewaysn = c.gatewaysn AND a.ServiceCode like c.code||'%' and a.serviceSpecID=1 and a.status!='4' and a.status!='10' and a.servicecode like '010987654321%' and SubsidiaryID=999999999   基本上能够确定是这个SQL引发了系统CPU资源大量被占用,那到底是什么缘由形成这个SQL这么大量占用CPU资源呢,咱们先来看看数据库的进程等待事件都有些什么:   SQL> select sid,event,p1,p1text from v$session_wait;    SID EVENT P1 P1TEXT   ---------- ----------------------------------------------------------------    12 latch free 4.3982E+12 address    36 latch free 4.3982E+12 address    37 latch free 4.3982E+12 address    84 latch free 4.3982E+12 address    102 latch free 4.3982E+12 address    101 latch free 4.3982E+12 address    85 latch free 4.3982E+12 address    106 latch free 4.3982E+12 address    155 latch free 4.3982E+12 address    151 latch free 4.3982E+12 address    149 latch free 4.3982E+12 address    147 latch free 4.3982E+12 address    1 pmon timer 300 duration   从上面的查询咱们能够看出,大都是latch free的等待事件,而后接着查一下这些latch的等待都是什么进程产生的:   SQL> select spid from v$process where addr in   (select paddr from v$session where sid in(84,102,101,106,155,151));   SPID   ------------   25774   26010   25873   25725   由此看出latch free这个等待事件致使了上面的那个SQL语句都在等待,占用了大量的CPU资源。咱们来看看究竟主要是那种类型的latch的等待,根据下面的SQL语句:   SQL> SELECT latch#, name, gets, misses, sleeps    FROM v$latch    WHERE sleeps>0    ORDER BY sleeps;   LATCH# NAME GETS MISSES SLEEPS   ---------- ----------------------------------------------------------------    15 messages 96876 20 1    159 library cache pin allocation 407322 43 1    132 dml lock allocation 194533 213 2    4 session allocation 304897 48 3    115 redo allocation 238031 286 4    17 enqueue hash chains 277510 85 5    7 session idle bit 2727264 314 16    158 library cache pin 3881788 5586 58    156 shared pool 2771629 6184 662    157 library cache 5637573 25246 801    98 cache buffers chains 1722750424 758400 109837   由上面的查询能够看出最主要的latch等待是cache buffers chains,这个latch的等待代表数据库存在单独的BLOCK的竞争这些latch,咱们来看这个latch存在的子latch及其对应的类型:   SQL> SELECT addr, latch#, gets, misses, sleeps    FROM v$latch_children    WHERE sleeps>0    and latch# = 98    ORDER BY sleeps desc;   ADDR LATCH# GETS MISSES SLEEPS   ---------------- ---------- ---------- ---------- ----------   000004000A3DFD10 98 10840661 82891 389   000004000A698C70 98 159510 2 244   0000040009B21738 98 104269771 34926 209   0000040009B227A8 98 107604659 35697 185   000004000A3E0D70 98 5447601 18922 156   000004000A6C2BD0 98 853375 7 134   0000040009B24888 98 85538409 25752 106   ……………   接着咱们来查看sleep较多的子latch对应都有哪些对象:   SQL> select distinct a.owner,a.segment_name,a.segment_type from    dba_extents a,   (select dbarfil,dbablk   from x$bh   where hladdr in    (select addr    from (select addr    from v$latch_children    order by sleeps desc)    where rownum < 5)) b   where a.RELATIVE_FNO = b.dbarfil   and a.BLOCK_ID <= b.dbablk and a.block_id + a.blocks > b.dbablk;   OWNER SEGMENT_NAME SEGMENT_TYPE   ---------------------------------------------------------------------------   TEST I_SERVICE_SERVICESPECID INDEX   TEST I_SERVICE_SUBSIDIARYID INDEX   TEST SERVICE TABLE   TEST MSWITCHDOMAIN TABLE   TEST I_SERVICE_SC_S INDEX   …………………   咱们看到在开始的那个SQL语句中的几个对象都有包括在内,因而来看看开始的那个SQL的执行计划:   SQL> set autotrace trace explain   SQL>SELECT d.domainname,d.mswitchdomainid, a.SERVICEID,a.SERVICECODE,a.USERTYPE,a.STATUS,a.NOTIFYSTATUS,to_char(a.DATECREATED,'yyyy-mm-dd hh24:mi:ss') DATECREATED,VIPFLAG,STATUS2,CUSTOMERTYPE,CUSTOMERID FROM service a, gatewayloc b, subbureaunumber c, mswitchdomain d WHERE b.mswitchdomainid = d.mswitchdomainid and b.gatewaysn = c.gatewaysn AND a.ServiceCode like c.code||'%' and a.serviceSpecID=1 and a.status!='4' and a.status!='10' and a.servicecode like '010987654321%' and SubsidiaryID=999999999;   Execution Plan   ----------------------------------------------------------    0 SELECT STATEMENT Optimizer=CHOOSE    1 0 NESTED LOOPS    2 1 NESTED LOOPS    3 2 NESTED LOOPS    4 3 TABLE ACCESS (FULL) OF 'SUBBUREAUNUMBER'    5 3 TABLE ACCESS (BY INDEX ROWID) OF 'GATEWAYLOC'    6 5 INDEX (UNIQUE SCAN) OF 'PK_GATEWAYLOC' (UNIQUE)    7 2 TABLE ACCESS (BY INDEX ROWID) OF 'MSWITCHDOMAIN'    8 7 INDEX (UNIQUE SCAN) OF 'PK_MSWITCHDOMAIN' (UNIQUE)    9 1 TABLE ACCESS (BY INDEX ROWID) OF 'SERVICE'    10 9 AND-EQUAL    11 10 INDEX (RANGE SCAN) OF 'I_SERVICE_SERVICESPECID' (NON    -UNIQUE)    12 10 INDEX (RANGE SCAN) OF 'I_SERVICE_SUBSIDIARYID' (NON-    UNIQUE)   根据开始查到的引发latch free等待中的对象和SQL语句的执行计划,以为SERVICE表上的索引有问题,彷佛存在了过多的扫描,因而将一样的SQL语句在别的地市的一样的数据库上执行一下,查看相应的执行计划:   SQL> set autotrace trace explain   SQL>SELECT d.domainname,d.mswitchdomainid, a.SERVICEID,a.SERVICECODE,a.USERTYPE,a.STATUS,a.NOTIFYSTATUS,to_char(a.DATECREATED,'yyyy-mm-dd hh24:mi:ss') DATECREATED,VIPFLAG,STATUS2,CUSTOMERTYPE,CUSTOMERID FROM service a, gatewayloc b, subbureaunumber c, mswitchdomain d WHERE b.mswitchdomainid = d.mswitchdomainid and b.gatewaysn = c.gatewaysn AND a.ServiceCode like c.code||'%' and a.serviceSpecID=1 and a.status!='4' and a.status!='10' and a.servicecode like '010987654321%' and SubsidiaryID=999999999;   Execution Plan   ----------------------------------------------------------    0 SELECT STATEMENT Optimizer=CHOOSE    1 0 TABLE ACCESS (BY INDEX ROWID) OF 'SERVICE'    2 1 NESTED LOOPS    3 2 NESTED LOOPS    4 3 NESTED LOOPS    5 4 TABLE ACCESS (FULL) OF 'SUBBUREAUNUMBER'    6 4 TABLE ACCESS (BY INDEX ROWID) OF 'GATEWAYLOC'    7 6 INDEX (UNIQUE SCAN) OF 'PK_GATEWAYLOC' (UNIQUE)    8 3 TABLE ACCESS (BY INDEX ROWID) OF 'MSWITCHDOMAIN'    9 8 INDEX (UNIQUE SCAN) OF 'PK_MSWITCHDOMAIN' (UNIQUE)    10 2 INDEX (RANGE SCAN) OF 'I_SERVICE_SC_S' (NON-UNIQUE)   对比两个执行计划,发现索引I_SERVICE_SERVICESPECID和I_SERVICE_SUBSIDIARYID是不该该走的,因而又对比了两个地方SERVICE表上的索引个数:   SQL> select index_name from user_indexes where table_name='SERVICE';   INDEX_NAME   ------------------------------   I_SERVICE_ACCOUNTNUM   I_SERVICE_CID   I_SERVICE_DATEACTIVATED   I_SERVICE_PRICEPLANID   I_SERVICE_SC_S   I_SERVICE_SERVICECODE   I_SERVICE_SERVICESPECID   I_SERVICE_SUBSIDIARYID   PK_SERVICE_SID   SQL> select index_name from user_indexes where table_name='SERVICE';   INDEX_NAME   ------------------------------   I_SERVICE_ACCOUNTNUM   I_SERVICE_CID   I_SERVICE_DATEACTIVATED   I_SERVICE_SC_S   I_SERVICE_SERVICECODE   PK_SERVICE_SID   发现存在问题的数据库中的SERVICE表上不知道怎么多出了I_SERVICE_PRICEPLANID、I_SERVICE_SERVICESPECID 、I_SERVICE_SUBSIDIARYID三个索引,而这些索引就是致使了开始那个SQL语句用了不应用的索引,引发latch free等待和CPU占用很高的罪魁祸首,因而删除了那三个索引,从新执行相应的SQL语句,很快就得出告终果,CPU的利用率也立刻降低为正常了,观察结果以下:   last pid: 26387; load averages: 1.61, 1.38, 1.21   195 processes: 194 sleeping, 1 on cpu   CPU states: 96.2% idle, 1.6% user, 1.7% kernel, 0.5% iowait, 0.0% swap   Memory: 8192M real, 1183M free, 14M swap in use, 8179M swap free   PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND   26383 oracle 1 59 0 4550M 4506M sleep 0:12 4.52% oracle    409 root 15 59 0 7168K 7008K sleep 173.1H 0.53% picld   25653 oracle 1 59 0 4550M 4508M sleep 2:12 0.48% oracle   26384 root 1 59 0 2800K 1912K cpu2 0:00 0.21% top-3.5b8-sun4u   25569 oracle 1 59 0 4550M 4508M sleep 0:12 0.09% oracle   25717 oracle 1 59 0 4550M 4507M sleep 0:07 0.05% oracle   25571 oracle 1 59 0 4550M 4507M sleep 0:10 0.04% oracle   25681 oracle 1 59 0 4550M 4508M sleep 0:10 0.04% oracle   25544 oracle 1 58 0 4554M 4501M sleep 0:14 0.03% oracle   25703 oracle 1 59 0 4550M 4506M sleep 0:23 0.03% oracle   ………………   对于CPU利用率太高的状况,若是是SQL语句性能比较低下引发的基本上均可以按照这个思路来诊断和解决问题,固然具体问题还得具体分析,解决问题的方法也有不少种,这里不过是抛砖引玉一下,只要能最终达到咱们解决问题的目的就能够了。  
相关文章
相关标签/搜索