使用hbase hbck修复region中数据不一致问题

[hbase版本1.1.2]

【第一次检查】
# 执行命令: hbase hbck -details "default:test_tony" > 20171227_hbck_test_tony 2>&1 &
# 查看执行日志 20171227_hbck_test_tony 发现3种错误:[1] First region should start with an empty key、[2] Region not deployed on any region server、[3] a hole in the region chain
日志摘要以下
[hbase@kmr-core1-001 ~]$ less 20171227_hbck_test_tony
... ...
ERROR: (region test_tony,P_4013488,1512319359517.c6892d77b1ea148f3e0642d9fdce68af.) First region should start with an empty key.  You need to  create a new region and regioninfo in HDFS to plug the hole.
'ksai:export_import_table_test': There is a hole in the region chain between P_802083 and P_901927D.  You need to create a new .regioninfo and region dir in hdfs to plug the hole.
ERROR: There is a hole in the region chain between P_A011FE2 and P_B00FAE.  You need to create a new .regioninfo and region dir in hdfs to plug the hole.
---- Table 'test_tony': overlap groups
There are 0 overlap groups with 0 overlapping regions
ERROR: Found inconsistency in table test_tony
2017-12-27 16:27:47,925 INFO  [main] util.HBaseFsck: Computing mapping of all store files
... ...
ERROR: Region { meta => test_tony,P_2009AE6,1511517508282.1dc960c2cb30a897ec50bbb656e9faa9., hdfs => hdfs://hdfs-ha/apps/hbase/data/data/default/test_tony/1dc960c2cb30a897ec50bbb656e9faa9, deployed => , replicaId => 0 } not deployed on any region server.
ERROR: Region { meta => test_tony,P_F007E:,1511757663778.5534b6665b97db8d8a7851955ccd8dc5., hdfs => hdfs://hdfs-ha/apps/hbase/data/data/default/test_tony/5534b6665b97db8d8a7851955ccd8dc5, deployed => , replicaId => 0 } not deployed on any region server.
ERROR: Region { meta => test_tony,P_802083,1513409109894.57ff7ad1a791fb0f8c70f751108d55dd., hdfs => hdfs://hdfs-ha/apps/hbase/data/data/default/test_tony/57ff7ad1a791fb0f8c70f751108d55dd, deployed => , replicaId => 0 } not deployed on any region server.
ERROR: Region { meta => test_tony,,1511517508282.d0a9bbf43b36b122b7c7f4256f9cdba4., hdfs => hdfs://hdfs-ha/apps/hbase/data/data/default/test_tony/d0a9bbf43b36b122b7c7f4256f9cdba4, deployed => , replicaId => 0 } not deployed on any region server.
ERROR: Region { meta => test_tony,P_A011FE2,1511537471096.f5777d857532db7ac592e3b621c0372e., hdfs => hdfs://hdfs-ha/apps/hbase/data/data/default/test_tony/f5777d857532db7ac592e3b621c0372e, deployed => , replicaId => 0 } not deployed on any region server.
... ...
2017-12-27 16:27:49,637 INFO  [main] util.HBaseFsck: Finishing hbck
Summary:
Table hbase:meta is okay.
    Number of regions: 1
    Deployed on:  kmr-5b9c18fc-gn-7b3518df-core-1-005.ksc.com,16020,1514187996430
Table test_tony is inconsistent.
    Number of regions: 8
    Deployed on:  kmr-5b9c18fc-gn-7b3518df-core-1-001.ksc.com,16020,1514188220947 kmr-5b9c18fc-gn-7b3518df-core-1-003.ksc.com,16020,1514187978942 kmr-5b9c18fc-gn-7b3518df-core-1-004.ksc.com,16020,1514187984473 kmr-5b9c18fc-gn-7b3518df-core-1-005.ksc.com,16020,1514187996430 kmr-5b9c18fc-gn-7b3518df-core-1-006.ksc.com,16020,1514188010078 kmr-5b9c18fc-gn-7b3518df-core-1-008.ksc.com,16020,1514188032392
9 inconsistencies detected.
Status: INCONSISTENT
# 阻塞在了转移一个region到其余region server的过程当中,可能的缘由是源region未分配给其余region server,可能的解决办法是:在保证这个hfile存在的前提下,手动强制assign该region



【第一次修复】java

# 注意:修复此表以前先停掉它
hbase(main):022:0> disable 'default:test_tony'
0 row(s) in 4.9250 seconds
linux

# 再执行命令: hbase hbck -repair "default:test_tony" > 20171227_tried_repaired_test_tony 2>&1 &apache

# 查看执行日志 20171227_tried_repaired_test_tony 发现上述的三种错误中的两种已修复,可是 Region not deployed on any region server错误还没能修复
日志摘要以下
ERROR: Region { meta => test_tony,P_2009AE6,1511517508282.1dc960c2cb30a897ec50bbb656e9faa9., hdfs => hdfs://hdfs-ha/apps/hbase/data/data/default/test_tony/1dc960c2cb30a897ec50bbb656e9faa9, deployed => , replicaId => 0 } not deployed on any region server.
ERROR: Region { meta => test_tony,P_F007E:,1511757663778.5534b6665b97db8d8a7851955ccd8dc5., hdfs => hdfs://hdfs-ha/apps/hbase/data/data/default/test_tony/5534b6665b97db8d8a7851955ccd8dc5, deployed => , replicaId => 0 } not deployed on any region server.
Trying to fix unassigned region...
Trying to fix unassigned region...
ERROR: Region { meta => test_tony,,1511517508282.d0a9bbf43b36b122b7c7f4256f9cdba4., hdfs => hdfs://hdfs-ha/apps/hbase/data/data/default/test_tony/d0a9bbf43b36b122b7c7f4256f9cdba4, deployed => , replicaId => 0 } not deployed on any region server.
Trying to fix unassigned region...
ERROR: Region { meta => test_tony,P_802083,1513409109894.57ff7ad1a791fb0f8c70f751108d55dd., hdfs => hdfs://hdfs-ha/apps/hbase/data/data/default/test_tony/57ff7ad1a791fb0f8c70f751108d55dd, deployed => , replicaId => 0 } not deployed on any region server.
Trying to fix unassigned region...
ERROR: Region { meta => test_tony,P_A011FE2,1511537471096.f5777d857532db7ac592e3b621c0372e., hdfs => hdfs://hdfs-ha/apps/hbase/data/data/default/test_tony/f5777d857532db7ac592e3b621c0372e, deployed => , replicaId => 0 } not deployed on any region server.
Trying to fix unassigned region...



【第二次修复】
# 执行命令: hbase hbck -fixMeta -fixAssignments "default:test_tony" > 20171227_2nd_tried_repaired_test_tony 2>&1 &
# 查看执行日志 20171227_2nd_tried_repaired_test_tony 发现上面的三种错误中[1]和[3]修好了,可是[2]还没修复,并且新发现了错误 [4] Region failed to move out of transition within timeout XXXXXXms
日志摘要以下
2017-12-27 17:07:19,032 WARN  [hbasefsck-pool1-t42] util.HBaseFsck: Unable to complete check or repair the region 'test_tony,P_802083,1513409109894.57ff7ad1a791fb0f8c70f751108d55dd.'.
java.io.IOException: Region {ENCODED => 57ff7ad1a791fb0f8c70f751108d55dd, NAME => 'test_tony,P_802083,1513409109894.57ff7ad1a791fb0f8c70f751108d55dd.', STARTKEY => 'P_802083', ENDKEY => 'P_901927D'} failed to move out of transition within timeout 120000ms
        at org.apache.hadoop.hbase.util.HBaseFsckRepair.waitUntilAssigned(HBaseFsckRepair.java:149)
        at org.apache.hadoop.hbase.util.HBaseFsck.tryAssignmentRepair(HBaseFsck.java:2114)
        at org.apache.hadoop.hbase.util.HBaseFsck.checkRegionConsistency(HBaseFsck.java:2315)
        at org.apache.hadoop.hbase.util.HBaseFsck.access$1100(HBaseFsck.java:197)
        at org.apache.hadoop.hbase.util.HBaseFsck$CheckRegionConsistencyWorkItem.call(HBaseFsck.java:1887)
        at org.apache.hadoop.hbase.util.HBaseFsck$CheckRegionConsistencyWorkItem.call(HBaseFsck.java:1875)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2017-12-27 17:05:21,028 INFO  [hbasefsck-pool1-t28] util.HBaseFsckRepair: Region still in transition, waiting for it to become assigned: {ENCODED => f5777d857532db7ac592e3b621c0372e, NAME => 'test_tony,P_A011FE2,1511537471096.f5777d857532db7ac592e3b621c0372e.', STARTKEY => 'P_A011FE2', ENDKEY => 'P_B00FAE'}
2017-12-27 17:05:22,028 INFO  [hbasefsck-pool1-t26] util.HBaseFsckRepair: Region still in transition, waiting for it to become assigned: {ENCODED => d0a9bbf43b36b122b7c7f4256f9cdba4, NAME => 'test_tony,,1511517508282.d0a9bbf43b36b122b7c7f4256f9cdba4.', STARTKEY => '', ENDKEY => 'P_2009AE6'}
# 出现错误[4]以及Region still in transition(RIT)的缘由是,该region被原来的Region Server unassigned了,可是尚未被assigned到一个新的RS上,处于无主状态。


【第三次修复】
# 在linux命令行执行 hbase hbck -fixAssignments 'default:test_tony' > 20171227_toFix-_an_empty_key 2>&1 &
# 查看执行日志 20171227_toFix-_an_empty_key ,成功修复! 
Summary:
Table hbase:meta is okay.
    Number of regions: 1
    Deployed on:  kmr-5b9c18fc-gn-7b3518df-core-1-005.ksc.com,16020,1514187996430
Table test_tony is okay.
    Number of regions: 0
    Deployed on: 
0 inconsistencies detected.
Status: OK

# 再校验一下数据是否是好了
hbase hbck -fixAssignments 'default:test_tony' > 20171227_After_fixAssgnmentOf_should_end_with_an_empty_key_FORtest_tony 2>&1 &
# 查看执行日志 20171227_After_fixAssgnmentOf_should_end_with_an_empty_key_FORtest_tony 发现Status: OK # 成功修复之后,再从新启用表 hbase(main):028:0> enable 'default:test_tony' 0 row(s) in 2.3100 seconds 至此,修复完成。