HBase-0.95.1源码分析之split
split操作执行的是将HBase中较大的Region分为两个。因为split比较耗时,因此split是在独立的线程中完成的,相关类是CompactSplitThread。
首先,CompactSplitThread中分区的方法主要是以下两个:
CompactSplitThread.requestSplit(HRegion):检测是否需要分区,如果需要分区则调用requestSplit(HRegion, byte[])
CompactSplitThread.requestSplit(HRegion, byte[]):执行分区操作<用户手动分区调用的也是该方法>
CompactSplitThread.requestSplit(HRegion r):
1> CompactSplitThread.shouldSplitRegion():《hbase.regionserver.regionSplitLimit,区域数超过该值则不再进行分区》
2> HRegion.checkSplit():调用的是RegionSplitPolicy.shouldSplit()
分区策略类的配置属性为:
hbase.regionserver.region.split.policy:IncreasingToUpperBoundRegionSplitPolicy<0.94版本之前默认为ConstantSizeRegionSplitPolicy>
IncreasingToUpperBoundRegionSplitPolicy类中shouldSplitRegion()方法首先根据当前分区所属表的分区数计算实际上用于确定是否需要分区的上限大小:
< long sizeToCheck = tableRegionsCount == 0 ? getDesiredMaxFileSize() : Math.min(getDesiredMaxFileSize(), this.flushSize * tableRegionsCount * tableRegionsCount);>
0.94版本之前是存储文件大小的上限是<hbase.hregion.max.filesize=10G:一个区域的某个列族所有HStoreFile总大小,超过则进行分区>,只有达到了该上限才会进行分区。
如果分区中某一个store(列族)的所有存储文件大小大于以上限制,则判断为需要分区。
方法具体内容如下:
- 1 @Override
- 2 protected boolean shouldSplit() {
- 3 if (region.shouldForceSplit()) {
- 4 return true;
- 5 }
- 6 boolean foundABigStore = false;
- 7 // Get count of regions that have the same common table as this.region
- 8 int tableRegionsCount = getCountOfCommonTableRegions();
- 9 // Get size to check
- 10 long sizeToCheck = getSizeToCheck(tableRegionsCount);
- 11
- 12 for (Store store : region.getStores().values()) {
- 13 // If any of the stores is unable to split (eg they contain reference files) then don\'t split
- 14 if (!store.canSplit()) {
- 15 return false;
- 16 }
- 17
- 18 // Mark if any store is big enough
- 19 long size = store.getSize();
- 20 if (size > sizeToCheck) {
- 21 IncreasingToUpperBoundRegionSplitPolicy.LOG.debug("ShouldSplit because " + store.getColumnFamilyName() + " size=" + size + ", sizeToCheck=" + sizeToCheck + ", regionsWithCommonTable=" + tableRegionsCount);
- 22 foundABigStore = true;
- 23 break;
- 24 }
- 25 }
- 26 return foundABigStore;
- 27 }
View Code
分区过程调用堆栈如下:
1> CompactSplitThread .requestSplit(HRegion r, byte[] midKey):
this.splits.execute(new SplitRequest(r, midKey, this.server));
SplitTransaction.prepare()
SplitTransaction.execute(Server, RegionServerServices)
createDaughters(Server, RegionServerServices)
1> 关闭父区域并从在线区域列表中移除
2> 对父区域的HStoreFile进行split操作
3> 创建子女区域
4> 元数据修改
openDaughters(Server, RegionServerServices, HRegion, HRegion)
split触发时机:
1> compact之后会调用CompactSplitThread.requestSplit(HRegion)
2> flush之前会检测区域中HStoreFile数目是否超过hbase.hstore.blockingStoreFiles,如果超过且没有等待超时会调用CompactSplitThread.requestSplit(HRegion)
3> flush之后会调用HRegion.checkSplit()检测是否需要split,如果需要则调用CompactSplitThread.requestSplit(HRegion)