微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

解决 NN 连接不上 JN 的问题

自动故障转移配置好以后,然后使用 start-dfs.sh 群起脚本启动 hdfs 集群,有可能会遇到 NameNode 起来一会后,进程自动关闭的问题。查看 NameNode 日志,报错信息如下:

2020-08-17 10:11:40,658 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop104/192.168.6.104:8485. Already tried 0 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:40,659 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop102/192.168.6.102:8485. Already tried 0 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:40,659 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop103/192.168.6.103:8485. Already tried 0 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:41,660 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop104/192.168.6.104:8485. Already tried 1 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:41,660 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop102/192.168.6.102:8485. Already tried 1 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:41,665 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop103/192.168.6.103:8485. Already tried 1 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:42,661 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop104/192.168.6.104:8485. Already tried 2 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:42,661 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop102/192.168.6.102:8485. Already tried 2 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:42,667 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop103/192.168.6.103:8485. Already tried 2 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:43,662 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop104/192.168.6.104:8485. Already tried 3 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:43,662 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop102/192.168.6.102:8485. Already tried 3 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:43,668 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop103/192.168.6.103:8485. Already tried 3 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:44,663 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop104/192.168.6.104:8485. Already tried 4 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:44,663 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop102/192.168.6.102:8485. Already tried 4 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:44,670 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop103/192.168.6.103:8485. Already tried 4 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:45,467 INFO 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 
ms (timeout=20000 ms) for a response for selectStreamingInputStreams. No 
responses yet.
2020-08-17 10:11:45,664 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop102/192.168.6.102:8485. Already tried 5 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:45,664 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop104/192.168.6.104:8485. Already tried 5 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:45,672 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop103/192.168.6.103:8485. Already tried 5 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:46,469 INFO 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 7003 
ms (timeout=20000 ms) for a response for selectStreamingInputStreams. No responses yet.
2020-08-17 10:11:46,665 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop102/192.168.6.102:8485. Already tried 6 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:46,665 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop104/192.168.6.104:8485. Already tried 6 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:46,673 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop103/192.168.6.103:8485. Already tried 6 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:47,470 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 8004 
ms (timeout=20000 ms) for a response for selectStreamingInputStreams. No responses yet.
2020-08-17 10:11:47,666 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop102/192.168.6.102:8485. Already tried 7 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:47,667 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop104/192.168.6.104:8485. Already tried 7 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:47,674 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop103/192.168.6.103:8485. Already tried 7 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:48,471 INFO 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 9005 
ms (timeout=20000 ms) for a response for selectStreamingInputStreams. No responses yet.
2020-08-17 10:11:48,668 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop102/192.168.6.102:8485. Already tried 8 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:48,668 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop104/192.168.6.104:8485. Already tried 8 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:48,675 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop103/192.168.6.103:8485. Already tried 8 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:49,669 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop102/192.168.6.102:8485. Already tried 9 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:49,673 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop104/192.168.6.104:8485. Already tried 9 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:49,676 INFO org.apache.hadoop.ipc.Client: retrying connect 
to server: hadoop103/192.168.6.103:8485. Already tried 9 time(s); retry 
policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-08-17 10:11:49,678 WARN org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to determine input 
streams from QJM to [192.168.6.102:8485, 192.168.6.103:8485, 
192.168.6.104:8485]. Skipping.org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many 
exceptions to achieve quorum size 2/3. 3 exceptions thrown:
192.168.6.103:8485: Call From hadoop102/192.168.6.102 to hadoop103:8485 
Failed on connection exception: java.net.ConnectException: 拒绝连接; For more 
details see: http://wiki.apache.org/hadoop/ConnectionRefused
192.168.6.102:8485: Call From hadoop102/192.168.6.102 to hadoop102:8485 
Failed on connection exception: java.net.ConnectException: 拒绝连接; For more 
details see: http://wiki.apache.org/hadoop/ConnectionRefused
192.168.6.104:8485: Call From hadoop102/192.168.6.102 to hadoop104:8485 
Failed on connection exception: java.net.ConnectException: 拒绝连接; For more 
details see: http://wiki.apache.org/hadoop/ConnectionRefused

查看报错日志,可分析出报错原因是因为 NameNode 连接不上 JournalNode,而利用 jps 命令查看到三台 JN 都已经正常启动,为什么 NN 还是无法正常连接到 JN 呢?这是因为 start-dfs.sh 群起脚本认的启动顺序是先启动 NN,再启动 DN,然后再启动 JN,并且认的 rpc 连接参数是重试次数为 10,每次重试的间隔是 1s,也就是说启动完 NN以后的 10s 中内,JN 还启动不起来,NN 就会报错了。

core-default.xml 里面有两个参数如下:

<!-- NN 连接 JN 重试次数认是 10 次 -->
<property>
  <name>ipc.client.connect.max.retries</name>
  <value>10</value>
</property>
<!-- 重试时间间隔,认 1s -->
<property>
  <name>ipc.client.connect.retry.interval</name>
  <value>1000</value>
</property>

解决方案:遇到上述问题后,可以稍等片刻,等 JN 成功启动后,手动启动下三台NN:

[root@hadoop102 ~]$ hdfs --daemon start namenode
[root@hadoop103 ~]$ hdfs --daemon start namenode
[root@hadoop104 ~]$ hdfs --daemon start namenode

也可以在 core-site.xml 里面适当调大上面的两个参数:

<!-- NN 连接 JN 重试次数认是 10 次 -->
<property>
  <name>ipc.client.connect.max.retries</name>
  <value>20</value>
</property>
<!-- 重试时间间隔,认 1s -->
<property>
  <name>ipc.client.connect.retry.interval</name>
  <value>5000</value>
</property>

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。

相关推荐