2018-04-21 08:47 by AlfredZhao, 阅读, 评论, 收藏, 编辑

之前在《Oracle RAC环境下定位并杀掉最终阻塞的会话》中,最终使用一个SQL查询出RAC实例之间的所有阻塞关系。但是实际在某些极端的生产环境,是不允许执行复杂的SQL语句,即使允许执行可能现场也不方便复制SQL,手敲的话效率低下,那么本文就介绍另一种简单的方法来快速定位最终阻塞会话,也就是DBA常用的oradebug hanganalyze。

直接根据《Oracle RAC环境下定位并杀掉最终阻塞的会话》中的来模拟,不再赘述。
如果按照之前的SQL查询,结果如下:

  1. SYS@jyzhao1 >--cascade blocking@gv$session
  2. SYS@jyzhao1 >select *
  3. 2 from (select a.inst_id, a.sid, a.serial#,
  4. 3 a.sql_id,
  5. 4 a.event,
  6. 5 a.status,
  7. 6 connect_by_isleaf as isleaf,
  8. 7 sys_connect_by_path(a.SID||'@'||a.inst_id, ' <- ') tree,
  9. 8 level as tree_level
  10. 9 from gv$session a
  11. 10 start with a.blocking_session is not null
  12. 11 connect by (a.sid||'@'||a.inst_id) = prior (a.blocking_session||'@'||a.blocking_instance))
  13. 12 where isleaf = 1
  14. 13 order by tree_level asc;
  15. INST_ID SID SERIAL# SQL_ID EVENT STATUS ISLEAF TREE TREE_LEVEL
  16. ---------- ---------- ---------- ------------- ----------------------------------- -------- ---------- ------------------------------ ----------
  17. 1 26 3479 SQL*Net message from client INACTIVE 1 <- 148@2 <- 26@1 2
  18. 1 26 3479 SQL*Net message from client INACTIVE 1 <- 29@1 <- 148@2 <- 26@1 3
  19. 1 26 3479 SQL*Net message from client INACTIVE 1 <- 23@2 <- 148@2 <- 26@1 3
  20. SYS@jyzhao1 >

如今假设我们的环境的确不方便使用SQL查询,那么就需要用到oradebug hanganalyze来分析。
oradebug hanganalyze 3

因为我这里是RAC环境,需要分析所有实例:
oradebug -g all hanganalyze 3

  1. SYS@jyzhao1 >oradebug -g all hanganalyze 3
  2. Hang Analysis in /opt/app/oracle/diag/rdbms/jyzhao/jyzhao1/trace/jyzhao1_diag_1919.trc

我们去分析这个生成的trc文件,可以很清楚的看到HANG分析部分,存在两个chain,比如我这个实验的情况就是:
Chain 1: 可以看到实例1的会话29被实例2的会话148阻塞,实例2的会话148又被实例1的会话26阻塞;
Chain 2: 可以看到实例2的会话23被实例2的会话148阻塞,而实例2的会话148又在第一个chain中。

可以发现这与我之前用SQL查询的结果是一样的意思,都可以做到快速定位最终阻塞会话是实例1的会话26,与客户确认后杀掉即可。

附:oradebug hanganalyze 3分析的trace文件中的核心信息

  1. *** 2018-04-21 07:51:44.975
  2. ===============================================================================
  3. HANG ANALYSIS:
  4. instances (db_name.oracle_sid): jyzhao.jyzhao1, jyzhao.jyzhao2
  5. oradebug_node_dump_level: 3
  6. analysis initiated by oradebug
  7. os thread scheduling delay history: (sampling every 1.000000 secs)
  8. 0.000000 secs at [ 07:51:44 ]
  9. NOTE: scheduling delay has not been sampled for 0.305046 secs 0.000000 secs from [ 07:51:40 - 07:51:45 ], 5 sec avg
  10. 0.000000 secs from [ 07:50:45 - 07:51:45 ], 1 min avg
  11. 0.000000 secs from [ 07:46:45 - 07:51:45 ], 5 min avg
  12. vktm time drift history
  13. ===============================================================================
  14. Chains most likely to have caused the hang:
  15. [a] Chain 1 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'<='enq: TX - row lock contention'
  16. Chain 1 Signature Hash: 0x42598823
  17. [b] Chain 2 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'<='enq: TX - row lock contention'
  18. Chain 2 Signature Hash: 0x42598823
  19. ===============================================================================
  20. Non-intersecting chains:
  21. -------------------------------------------------------------------------------
  22. Chain 1:
  23. -------------------------------------------------------------------------------
  24. Oracle session identified by:
  25. {
  26. instance: 1 (jyzhao.jyzhao1)
  27. os id: 2712
  28. process id: 42, oracle@jyrac1 (TNS V1-V3)
  29. session id: 29
  30. session serial #: 81
  31. }
  32. is waiting for 'enq: TX - row lock contention' with wait info:
  33. {
  34. p1: 'name|mode'=0x54580006
  35. p2: 'usn<<16 | slot'=0xe0002
  36. p3: 'sequence'=0x3a3
  37. time in wait: 8 min 21 sec
  38. timeout after: never
  39. wait id: 37
  40. blocking: 0 sessions
  41. current sql: update emp set job = 'CEO' where empno = 7839
  42. short stack: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+178<-ksliwat()+2022<-kslwaitctx()+163<-k
  43. jusuc()+3400<-ksipgetctxi()+1759<-ksqcmi()+20798<-ksqgtlctx()+3501<-ksqgelctx()+557<-ktuGetTxForXid()+131<-ktcwit1()+336<-kdddgb()+8587<-kdusru()+460<-kauupd()+412<-updrow
  44. ()+2167<-qerupFetch()+860<-updaul()+1378<-updThreePhaseExe()+318<-updexe()+638<-opiexe()+10378<-kpoal8()+2380<-opiodr()+917<-ttcpip()+2183<-opitsk()+1710<-opiino()+969<-op
  45. iodr()+917<-opidrv()+570<-sou2o(
  46. wait history:
  47. * time between current wait and wait #1: 0.010123 sec
  48. 1. event: 'gc current block 2-way'
  49. time waited: 0.000622 sec
  50. wait id: 36 p1: ''=0x7
  51. p2: ''=0x6483
  52. p3: ''=0x2000001
  53. * time between wait #1 and #2: 0.007260 sec
  54. 2. event: 'gc cr block 2-way'
  55. time waited: 0.000501 sec
  56. wait id: 35 p1: ''=0x6
  57. p2: ''=0x523
  58. p3: ''=0x2c
  59. * time between wait #2 and #3: 0.000462 sec
  60. 3. event: 'gc cr block 2-way'
  61. time waited: 0.000689 sec
  62. wait id: 34 p1: ''=0x6
  63. p2: ''=0xb0
  64. p3: ''=0x2b
  65. }
  66. and is blocked by
  67. => Oracle session identified by:
  68. {
  69. instance: 2 (jyzhao.jyzhao2)
  70. os id: 23427
  71. process id: 41, oracle@jyrac2 (TNS V1-V3)
  72. session id: 148
  73. session serial #: 17715
  74. }
  75. which is waiting for 'enq: TX - row lock contention' with wait info:
  76. {
  77. p1: 'name|mode'=0x54580006
  78. p2: 'usn<<16 | slot'=0x50008
  79. p3: 'sequence'=0x9e6
  80. time in wait: 9 min 9 sec
  81. timeout after: never
  82. wait id: 152
  83. blocking: 2 sessions
  84. current sql: update emp set job = 'MANAGER' where empno = 7788
  85. short stack: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+178<-ksliwat()+2022<-kslwaitctx()+163<-k
  86. jusuc()+3400<-ksipgetctxi()+1759<-ksqcmi()+20798<-ksqgtlctx()+3501<-ksqgelctx()+557<-ktuGetTxForXid()+131<-ktcwit1()+336<-kdddgb()+8587<-kdusru()+460<-kauupd()+412<-updrow
  87. ()+2167<-qerupFetch()+860<-updaul()+1378<-updThreePhaseExe()+318<-updexe()+638<-opiexe()+10378<-kpoal8()+2380<-opiodr()+917<-ttcpip()+2183<-opitsk()+1710<-opiino()+969<-op
  88. iodr()+917<-opidrv()+570<-sou2o(
  89. wait history:
  90. * time between current wait and wait #1: 0.005023 sec
  91. 1. event: 'gc cr block 2-way'
  92. time waited: 0.000478 sec
  93. wait id: 151 p1: ''=0x3
  94. p2: ''=0xc0
  95. p3: ''=0x19
  96. * time between wait #1 and #2: 0.000295 sec
  97. 2. event: 'gc cr block 2-way'
  98. time waited: 0.000821 sec
  99. wait id: 150 p1: ''=0x3
  100. p2: ''=0x362
  101. p3: ''=0x1a
  102. * time between wait #2 and #3: 0.000294 sec
  103. 3. event: 'gc cr block 2-way'
  104. time waited: 0.000431 sec
  105. wait id: 149 p1: ''=0x3
  106. p2: ''=0xc0
  107. p3: ''=0x19
  108. }
  109. and is blocked by
  110. => Oracle session identified by:
  111. {
  112. instance: 1 (jyzhao.jyzhao1)
  113. os id: 1648
  114. process id: 32, oracle@jyrac1 (TNS V1-V3)
  115. session id: 26
  116. session serial #: 3479
  117. }
  118. which is waiting for 'SQL*Net message from client' with wait info:
  119. {
  120. p1: 'driver id'=0x62657100
  121. p2: '#bytes'=0x1
  122. time in wait: 9 min 28 sec
  123. timeout after: never
  124. wait id: 168
  125. blocking: 3 sessions
  126. current sql: <none>
  127. short stack: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-read()+14<-ntpfprd()+117<-nsbasic_brc()+376<-nsbrecv()+69<-nioqrc()+4
  128. 95<-opikndf2()+978<-opitsk()+831<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253
  129. wait history:
  130. * time between current wait and wait #1: 0.000010 sec
  131. 1. event: 'SQL*Net message to client'
  132. time waited: 0.000003 sec
  133. wait id: 167 p1: 'driver id'=0x62657100
  134. p2: '#bytes'=0x1
  135. * time between wait #1 and #2: 0.000239 sec
  136. 2. event: 'gc current grant 2-way'
  137. time waited: 0.000337 sec
  138. wait id: 166 p1: ''=0x7
  139. p2: ''=0x6483
  140. p3: ''=0x2010001
  141. * time between wait #2 and #3: 0.000196 sec
  142. 3. event: 'db file sequential read'
  143. time waited: 0.000824 sec
  144. wait id: 165 p1: 'file#'=0x7
  145. p2: 'block#'=0x6483
  146. p3: 'blocks'=0x1
  147. }
  148. Chain 1 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'<='enq: TX - row lock contention'
  149. Chain 1 Signature Hash: 0x42598823
  150. -------------------------------------------------------------------------------
  151. ===============================================================================
  152. Intersecting chains:
  153. -------------------------------------------------------------------------------
  154. Chain 2:
  155. -------------------------------------------------------------------------------
  156. Oracle session identified by:
  157. {
  158. instance: 2 (jyzhao.jyzhao2)
  159. os id: 23488
  160. process id: 42, oracle@jyrac2 (TNS V1-V3)
  161. session id: 23
  162. session serial #: 12635
  163. }
  164. is waiting for 'enq: TX - row lock contention' with wait info:
  165. {
  166. p1: 'name|mode'=0x54580006
  167. p2: 'usn<<16 | slot'=0xe0002
  168. p3: 'sequence'=0x3a3
  169. time in wait: 8 min 34 sec
  170. timeout after: never
  171. wait id: 39
  172. blocking: 0 sessions
  173. current sql: update emp set sal = 15000 where empno = 7839
  174. short stack: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+178<-ksliwat()+2022<-kslwaitctx()+163<-k
  175. jusuc()+3400<-ksipgetctxi()+1759<-ksqcmi()+20798<-ksqgtlctx()+3501<-ksqgelctx()+557<-ktuGetTxForXid()+131<-ktcwit1()+336<-kdddgb()+8587<-kdusru()+460<-kauupd()+412<-updrow
  176. ()+2167<-qerupFetch()+860<-updaul()+1378<-updThreePhaseExe()+318<-updexe()+638<-opiexe()+10378<-kpoal8()+2380<-opiodr()+917<-ttcpip()+2183<-opitsk()+1710<-opiino()+969<-op
  177. iodr()+917<-opidrv()+570<-sou2o(
  178. wait history:
  179. * time between current wait and wait #1: 0.000226 sec
  180. 1. event: 'gc cr block 2-way'
  181. time waited: 0.000605 sec
  182. wait id: 38 p1: ''=0x3
  183. p2: ''=0xc0
  184. p3: ''=0x19
  185. * time between wait #1 and #2: 0.011878 sec
  186. 2. event: 'gc cr block 2-way'
  187. time waited: 0.000610 sec
  188. wait id: 37 p1: ''=0x3
  189. p2: ''=0xc0
  190. p3: ''=0x19
  191. * time between wait #2 and #3: 0.000316 sec
  192. 3. event: 'Disk file operations I/O'
  193. time waited: 0.000007 sec
  194. wait id: 36 p1: 'FileOperation'=0x2
  195. p2: 'fileno'=0x3
  196. p3: 'filetype'=0x2
  197. }
  198. and is blocked by 'instance: 2, os id: 23427, session id: 148',
  199. which is a member of 'Chain 1'.
  200. Chain 2 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'<='enq: TX - row lock contention'
  201. Chain 2 Signature Hash: 0x42598823
  202. -------------------------------------------------------------------------------
  203. ===============================================================================

可以看到这种oradebug hanganalyze生成的trc文件并没有那么的难理解,反而这种chain的描述更加清楚,身为一名专业的Oracle DBA,我们是必须要掌握这种方法的。

版权声明:本文为jyzhao原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://www.cnblogs.com/jyzhao/p/8897989.html