微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

大叔问题定位分享49hbase集群重启后master初始化失败

hbase集群重启后异常,发现是master初始化失败导致的,在master启动日志中发现问题原因为

2022-05-26 14:06:15,645 WARN org.apache.hadoop.hbase.master.HMaster: hbase:namespace,,1607716627354.56dafb9f3eadaae9e95d5b05f3142a34. is NOT online; state={56dafb9f3eadaae9e95d5b05f3142a34 state=OPEN, ts=1637906648217, server=hadoop-server1,16020,1637905629938}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.

问题原因是hbase非常关键的一个表 hbase:namespace 的region无法online,查看该region的详细信息如下:

hbase(main):003:0> get 'hbase:Meta', 'hbase:namespace,,1607716627354.56dafb9f3eadaae9e95d5b05f3142a34.'
COLUMN                                                 CELL                                                                                                                                                        
 info:regioninfo                                       timestamp=1637905641937, value={ENCODED => 56dafb9f3eadaae9e95d5b05f3142a34, NAME => 'hbase:namespace,,1607716627354.56dafb9f3eadaae9e95d5b05f3142a34.', STARTKEY => '', ENDKEY => ''}                                                                                                                                   
 info:seqnumDuringOpen                                 timestamp=1637905641937, value=\x00\x00\x00\x00\x00\x00\x00o                                                                                                
 info:server                                           timestamp=1637905641937, value=hadoop-server1:16020                                                                                                               
 info:serverstartcode                                  timestamp=1637905641937, value=1637905629938                                                                                                                
 info:sn                                               timestamp=1637905640873, value=hadoop-server1,16020,1637905629938                                                                                                 
 info:state                                            timestamp=1637905641937, value=OPEN                                                                                                                         
1 row(s)
Took 0.0737 seconds

尝试手工将该region恢复online

hbase(main):034:0> assign '56dafb9f3eadaae9e95d5b05f3142a34'
ERROR: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
    at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2998)
    at org.apache.hadoop.hbase.master.MasterRpcServices.assignRegion(MasterRpcServices.java:564)
    at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
For usage try 'help "assign"'

陷入死循环,要想恢复region,需要master先启动,但是master要启动,有需要该region是online状态

这时有两种方法

  • 一种是备份namespace目录/user/hbase/data/hbase/namespace,从hbase:Meta中先删除再恢复hbase:namespace,操作难度较大;
  • 一种是使用hbck2工具

下载地址:https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2

执行命令

hbase hbck -j hbase-hbck2-1.1.0.jar assigns 56dafb9f3eadaae9e95d5b05f3142a34

操作之后hbase:namespace恢复,master启动成功。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。

相关推荐