HBase中表的基本单位是Region,平常在调用HBase API操做一个表时,交互的数据也会以Region的形式进行呈现。一个表能够有若干个Region,今天笔者就来和你们分享一下Region合并的一些问题和解决方法。java
在分析合并Region以前,咱们先来了解一下Region的体系结构,以下图所示:apache
从图中可知,可以总结如下知识点:ruby
若是要查看HFile文件,HBase有提供命令,命令以下:bash
hbase hfile -p -f /hbase/data/default/ip_login/d0d7d881bb802592c09d305e47ae70a5/_d/7ec738167e9f4d4386316e5e702c8d3d
执行输出结果,以下图所示:负载均衡
那为何须要合并Region呢?这个须要从Region的Split来讲。当一个Region被不断的写数据,达到Region的Split的阀值时(由属性hbase.hregion.max.filesize来决定,默认是10GB),该Region就会被Split成2个新的Region。随着业务数据量的不断增长,Region不断的执行Split,那么Region的个数也会愈来愈多。less
一个业务表的Region越多,在进行读写操做时,或是对该表执行Compaction操做时,此时集群的压力是很大的。这里笔者作过一个线上统计,在一个业务表的Region个数达到9000+时,每次对该表进行Compaction操做时,集群的负载便会加剧。而间接的也会影响应用程序的读写,一个表的Region过大,势必整个集群的Region个数也会增长,负载均衡后,每一个RegionServer承担的Region个数也会增长。ide
所以,这种状况是颇有必要的进行Region合并的。好比,当前Region进行Split的阀值设置为30GB,那么咱们能够对小于等于10GB的Region进行一次合并,减小每一个业务表的Region,从而下降整个集群的Region,减缓每一个RegionServer上的Region压力。oop
那么咱们如何进行Region合并呢?HBase有提供一个合并Region的命令,具体操做以下:性能
# 合并相邻的两个Region hbase> merge_region 'ENCODED_REGIONNAME', 'ENCODED_REGIONNAME' # 强制合并两个Region hbase> merge_region 'ENCODED_REGIONNAME', 'ENCODED_REGIONNAME', true
可是,这种方式会有一个问题,就是只能一次合并2个Region,若是这里有几千个Region须要合并,这种方式是不可取的。学习
这里有一种批量合并的方式,就是经过编写脚本(merge_small_regions.rb)来实现,实现代码以下:
# Test Mode: # # hbase org.jruby.Main merge_empty_regions.rb namespace.tablename <skip_size> <batch_regions> <merge?> # # Non Test - ie actually do the merge: # # hbase org.jruby.Main merge_empty_regions.rb namespace.tablename <skip_size> <batch_regions> merge # # Note: Please replace namespace.tablename with your namespace and table, eg NS1.MyTable. This value is case sensitive. require 'digest' require 'java' java_import org.apache.hadoop.hbase.HBaseConfiguration java_import org.apache.hadoop.hbase.client.HBaseAdmin java_import org.apache.hadoop.hbase.TableName java_import org.apache.hadoop.hbase.HRegionInfo; java_import org.apache.hadoop.hbase.client.Connection java_import org.apache.hadoop.hbase.client.ConnectionFactory java_import org.apache.hadoop.hbase.client.Table java_import org.apache.hadoop.hbase.util.Bytes def list_bigger_regions(admin, table, low_size) cluster_status = admin.getClusterStatus() master = cluster_status.getMaster() biggers = [] cluster_status.getServers.each do |s| cluster_status.getLoad(s).getRegionsLoad.each do |r| # getRegionsLoad returns an array of arrays, where each array # is 2 elements # Filter out any regions that don't match the requested # tablename next unless r[1].get_name_as_string =~ /#{table}\,/ if r[1].getStorefileSizeMB() > low_size if r[1].get_name_as_string =~ /\.([^\.]+)\.$/ biggers.push $1 else raise "Failed to get the encoded name for #{r[1].get_name_as_string}" end end end end biggers end # Handle command line parameters table_name = ARGV[0] low_size = 1024 if ARGV[1].to_i >= low_size low_size=ARGV[1].to_i end limit_batch = 1000 if ARGV[2].to_i <= limit_batch limit_batch = ARGV[2].to_i end do_merge = false if ARGV[3] == 'merge' do_merge = true end config = HBaseConfiguration.create(); connection = ConnectionFactory.createConnection(config); admin = HBaseAdmin.new(connection); bigger_regions = list_bigger_regions(admin, table_name, low_size) regions = admin.getTableRegions(Bytes.toBytes(table_name)); puts "Total Table Regions: #{regions.length}" puts "Total bigger regions: #{bigger_regions.length}" filtered_regions = regions.reject do |r| bigger_regions.include?(r.get_encoded_name) end puts "Total regions to consider for Merge: #{filtered_regions.length}" filtered_regions_limit = filtered_regions if filtered_regions.length < 2 puts "There are not enough regions to merge" filtered_regions_limit = filtered_regions end if filtered_regions.length > limit_batch filtered_regions_limit = filtered_regions[0,limit_batch] puts "But we will merge : #{filtered_regions_limit.length} regions because limit in parameter!" end r1, r2 = nil filtered_regions_limit.each do |r| if r1.nil? r1 = r next end if r2.nil? r2 = r end # Skip any region that is a split region if r1.is_split() r1 = r2 r2 = nil puts "Skip #{r1.get_encoded_name} bcause it in spliting!" next end if r2.is_split() r2 = nil puts "Skip #{r2.get_encoded_name} bcause it in spliting!" next end if HRegionInfo.are_adjacent(r1, r2) # only merge regions that are adjacent puts "#{r1.get_encoded_name} is adjacent to #{r2.get_encoded_name}" if do_merge admin.mergeRegions(r1.getEncodedNameAsBytes, r2.getEncodedNameAsBytes, false) puts "Successfully Merged #{r1.get_encoded_name} with #{r2.get_encoded_name}" sleep 2 end r1, r2 = nil else puts "Regions are not adjacent, so drop the first one and with the #{r2.get_encoded_name} to iterate again" r1 = r2 r2 = nil end end admin.close
该脚本默认是合并1GB之内的Region,个数为1000个。若是咱们要合并小于10GB,个数在4000之内,脚本(merging-region.sh)以下:
#! /bin/bash num=$1 echo "[`date "+%Y-%m-%d %H:%M:%S"`] INFO : RegionServer Start Merging..." if [ ! -n "$num" ]; then echo "[`date "+%Y-%m-%d %H:%M:%S"`] INFO : Default Merging 10 Times." num=10 elif [[ $num == *[!0-9]* ]]; then echo "[`date "+%Y-%m-%d %H:%M:%S"`] INFO : Input [$num] Times Must Be Number." exit 1 else echo "[`date "+%Y-%m-%d %H:%M:%S"`] INFO : User-Defined Merging [$num] Times." fi for (( i=1; i<=$num; i++ )) do echo "[`date "+%Y-%m-%d %H:%M:%S"`] INFO : Merging [$i] Times,Total [$num] Times." hbase org.jruby.Main merge_small_regions.rb namespace.tablename 10240 4000 merge sleep 5 done
在merging-region.sh脚本中,作了参数控制,能够循环来执行批量合并脚本。可能在实际操做过程当中,批量执行一次Region合并,合并后的结果Region仍是有不少(可能此时又有新的Region生成),这是咱们可使用merging-region.sh这个脚本屡次执行批量合并Region操做,具体操做命令以下:
# 默认循环10次,例如本次循环执行5次 sh merging-region.sh 5
在合并Region的过程当中出现永久RIT怎么办?笔者在生产环境中就遇到过这种状况,在批量合并Region的过程当中,出现了永久MERGING_NEW的状况,虽然这种状况不会影响现有集群的正常的服务能力,可是若是集群有某个节点发生重启,那么可能此时该RegionServer上的Region是无法均衡的。由于在RIT状态时,HBase是不会执行Region负载均衡的,即便手动执行balancer命令也是无效的。
若是不解决这种RIT状况,那么后续有HBase节点相继重启,这样会致使整个集群的Region验证不均衡,这是很致命的,对集群的性能将会影响很大。通过查询HBase JIRA单,发现这种MERGING_NEW永久RIT的状况是触发了HBASE-17682的BUG,须要打上该Patch来修复这个BUG,其实就是HBase源代码在判断业务逻辑时,没有对MERGING_NEW这种状态进行判断,直接进入到else流程中了。源代码以下:
for (RegionState state : regionsInTransition.values()) { HRegionInfo hri = state.getRegion(); if (assignedRegions.contains(hri)) { // Region is open on this region server, but in transition. // This region must be moving away from this server, or splitting/merging. // SSH will handle it, either skip assigning, or re-assign. LOG.info("Transitioning " + state + " will be handled by ServerCrashProcedure for " + sn); } else if (sn.equals(state.getServerName())) { // Region is in transition on this region server, and this // region is not open on this server. So the region must be // moving to this server from another one (i.e. opening or // pending open on this server, was open on another one. // Offline state is also kind of pending open if the region is in // transition. The region could be in failed_close state too if we have // tried several times to open it while this region server is not reachable) if (state.isPendingOpenOrOpening() || state.isFailedClose() || state.isOffline()) { LOG.info("Found region in " + state + " to be reassigned by ServerCrashProcedure for " + sn); rits.add(hri); } else if(state.isSplittingNew()) { regionsToCleanIfNoMetaEntry.add(state.getRegion()); } else { LOG.warn("THIS SHOULD NOT HAPPEN: unexpected " + state); } } }
修复以后的代码以下:
for (RegionState state : regionsInTransition.values()) { HRegionInfo hri = state.getRegion(); if (assignedRegions.contains(hri)) { // Region is open on this region server, but in transition. // This region must be moving away from this server, or splitting/merging. // SSH will handle it, either skip assigning, or re-assign. LOG.info("Transitioning " + state + " will be handled by ServerCrashProcedure for " + sn); } else if (sn.equals(state.getServerName())) { // Region is in transition on this region server, and this // region is not open on this server. So the region must be // moving to this server from another one (i.e. opening or // pending open on this server, was open on another one. // Offline state is also kind of pending open if the region is in // transition. The region could be in failed_close state too if we have // tried several times to open it while this region server is not reachable) if (state.isPendingOpenOrOpening() || state.isFailedClose() || state.isOffline()) { LOG.info("Found region in " + state + " to be reassigned by ServerCrashProcedure for " + sn); rits.add(hri); } else if(state.isSplittingNew()) { regionsToCleanIfNoMetaEntry.add(state.getRegion()); } else if (isOneOfStates(state, State.SPLITTING_NEW, State.MERGING_NEW)) { regionsToCleanIfNoMetaEntry.add(state.getRegion()); }else { LOG.warn("THIS SHOULD NOT HAPPEN: unexpected " + state); } } }
可是,这里有一个问题,目前该JIRA单只是说了须要去修复BUG,打Patch。可是,实际生产状况下,面对这种RIT状况,是不可能长时间中止集群,影响应用程序读写的。那么,有没有临时的解决办法,先临时解决当前的MERGING_NEW这种永久RIT,以后在进行HBase版本升级操做。
办法是有的,在分析了MERGE合并的流程以后,发现HBase在执行Region合并时,会先生成一个初始状态的MERGING_NEW。整个Region合并流程以下:
从流程图中能够看到,MERGING_NEW是一个初始化状态,在Master的内存中,而处于Backup状态的Master内存中是没有这个新Region的MERGING_NEW状态的,那么能够经过对HBase的Master进行一个主备切换,来临时消除这个永久RIT状态。而HBase是一个高可用的集群,进行主备切换时对用户应用来讲是无感操做。所以,面对MERGING_NEW状态的永久RIT可使用对HBase进行主备切换的方式来作一个临时处理方案。以后,咱们在对HBase进行修复BUG,打Patch进行版本升级。
HBase的RIT问题,是一个比较常见的问题,在遇到这种问题时,能够先冷静的分析缘由,例如查看Master的日志、仔细阅读HBase Web页面RIT异常的描述、使用hbck命令查看Region、使用fsck查看HDFS的block等。分析出具体的缘由后,咱们在对症下药,作到大胆猜测,当心求证。
这篇博客就和你们分享到这里,若是你们在研究学习的过程中有什么问题,能够加群进行讨论或发送邮件给我,我会尽我所能为您解答,与君共勉!
另外,博主出书了《Hadoop大数据挖掘从入门到进阶实战》,喜欢的朋友或同窗, 能够在公告栏那里点击购买连接购买博主的书进行学习,在此感谢你们的支持。