1、所需软件
- 所需要的环境包括
java
,ssh
必须保证sshd
一直运行, 以便用Hadoop
脚本管理远端Hadoop
守护进程
Windows下的附加软件需求
Cygwin
提供上述软件之外的shell支持。
2、安装软件
sudo apt-get install ssh
sudo apt-get install rsync
- 由于
hadoop
是基于java
编写的,因此需要安装jdk
3、下载安装
参考资料:https://www.jianshu.com/p/cdae5bab030f
wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/stable/hadoop-3.3.0.tar.gz
tar -xvf hadoop-3.3.0.tar.gz -C /usr/local
cd /usr/local
mv hadoop-3.3.0 hadoop
- 给
hadoop
配置环境变量
vim /etc/profile
export JAVA_HOME=/usr/local/jdk1.8
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source /etc/profile
测试是否安装成功
hadoop version
root@iZuf63fv674pbylkkxs48qZ:/usr/local# hadoop version
Hadoop 3.3.0
Source code repository https://gitBox.apache.org/repos/asf/hadoop.git -r aa96f1871bfd858f9bac59cf2a81ec470da649af
Compiled by brahma on 2020-07-06T18:44Z
Compiled with protoc 3.7.1
From source with checksum 5dc29b802d6ccd77b262ef9d04d19c4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.0.jar
root@iZuf63fv674pbylkkxs48qZ:/usr/local#
4、修改配置文件
sudo vim /usr/local/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
<!-- 指定HDFS老大(namenode)的通信地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://0.0.0.0:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储路径 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/hdfs/data</value>
<description>datanode上数据块的物理存储位置</description>
</property>
<!-- 设置hdfs副本数量 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
-
在
hadoop-env.sh
中更改JAVA-HOME
,注释掉export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/local/jdk1.8
5、测试,启动
下面的操作均是在Hadoop
的安装路径下
/usr/local/hadoop
-
格式化
namenode
:/usr/local/hadoop# ./bin/hdfs namenode -format
-
启动
hdfs
。 开启NameNode
和Datanode
守护进程 -
如果报错如下
root@iZuf63fv674pbylkkxs48qZ:/usr/local/hadoop# ./sbin/start-dfs.sh
Starting namenodes on [localhost]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DatanODE_USER defined. Aborting operation.
Starting secondary namenodes [iZuf63fv674pbylkkxs48qZ]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
解决方案
在/hadoop/sbin
路径下
在start-dfs.sh
,stop-dfs.sh
文件的顶部添加:
HDFS_DatanODE_USER=root
HADOOP_DatanODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
在 start-yarn.sh
,stop-yarn.sh
顶部添加:
#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
在sbin
目录下重新执行./start-all.sh
,即可启动
- 重启的过程中,遇到下面的问题
root@iZuf63fv674pbylkkxs48qZ:/usr/local/hadoop/sbin# sudo ./start-dfs.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DatanODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [localhost]
localhost: root@localhost: Permission denied (publickey,password).
Starting datanodes
localhost: root@localhost: Permission denied (publickey,password).
Starting secondary namenodes [iZuf63fv674pbylkkxs48qZ]
iZuf63fv674pbylkkxs48qZ: root@izuf63fv674pbylkkxs48qz: Permission denied (publickey,password).
5.1 在虚拟机上搭建Hadoop
集群
- 在安装的时候,为每一台主机配置静态
ip
。配置静态ip
的自我配置教程如下
http://note.youdao.com/s/dDpr8UkW
- 更改
Ubunut
系统的下载源
sudo vim /etc/apt/source.list
将里面的 http://archive.ubuntu.com/ubuntu/
修改为 http://mirrors.aliyun.com/ubuntu/
- 首先安装好一台
Ubunut
系统后,作为master
,在系统中安装配置好静态ip
,安装jdk
,hadoop
后,按照相同的配置克隆master
得到node1,node2
。此处提到的名称需要使用如下命令修改
sudo vim /etc/hostname
sudo vim /etc/hosts
192.168.8.6 master
192.168.8.7 node1
192.168.8.8 node2
-
- 输入
cd ~
回到根目录 - 使用
ssh-keygen
,一直回车得到下面的结果
helloful@master:~$ cd ~ helloful@master:~$ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/home/helloful/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/helloful/.ssh/id_rsa Your public key has been saved in /home/helloful/.ssh/id_rsa.pub The key fingerprint is: SHA256:tGRzFOZansyT58ahQZyWOaIxESIVbPSkzYowY7AQyYM helloful@master The key's randomart image is: +---[RSA 3072]----+ |*o.+=.+. +. | |E+ .oB . = + | |=.... * * % | |.+ . . B & + | | . . . S O o | | B . | | . + | | . | | | +----[SHA256]-----+
-
输入
cd .ssh
-
输入
cat ./id_rsa.pub >> authorized_keys
helloful@master:~$ cd .ssh helloful@master:~/.ssh$ cat ./id_rsa.pub >> authorized_keys helloful@master:~/.ssh$ cat authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDffnfOM4rgcxtm8lkBzPojolSX1zz26r5+hOd0Iy5lgS7atDZgZqQ7JITShpwaENNJ7N8qumsjwnyulBsP5DSRGa0oXzJTafO+Drj47p5V+bI4Nejl+SjXrB6X5RIFD8VmuIrXNMtRx4bQQ4oZQyAF/qSa4wcnsBz8gmpuY3JAnArlsm9MCHfhvTg/zeVTbJjjbyc+8tGXVsa0AVmL5lcrxOcBPc0bP53/agwzPMHuBtlTbvpX2X57XxvKFov8WngSbMZYRWALsW9EvvBZg1oyPVEXo16WK80hWRlZKWiQANJgdWF3sFIiac22ml12NoH7KzmmDEDigd0pqAPaBOlcLvCzWigOJf22hmW8UDTP68kvjR8M4JPDjkwDC5UjO4mzRQUEukeXqGMOxM7drHlyqKpoVE1/zi9rKFSroCnd59a5HIv+0pobMkjwQATh8ZUBEGeEK7yXNBnQTvxFvA8qmJZ62WzGguaty4AWDDQ9HMTkA1twvmlCqBksFSQOpFM= helloful@master helloful@master:~/.ssh$
-
对每一台主机都执行上面的生成秘钥操作,然后把
node1.node2
的秘钥复制到master
中的authorized_keys
文件中,同理,把master,node2
的复制到node1
。以此类推。
- 输入
-
-
cd ~ cd /usr/local/hadoop/etc/hadoop
-
sudo vim core-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> <description>Abase for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> </configuration>
-
sudo vim hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8
-
修改
mapred-site.xml
中的地址为自己的地址sudo vim mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>master:49001</value> </property> <property> <name>mapred.local.dir</name> <value>/usr/local/hadoop/var</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
-
master
修改为node1 node2
node1
修改为master node2
node2
修改为master node1
<?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <property> <description>The address of the applications manager interface in the RM.</description> <name>yarn.resourcemanager.address</name> <value>${yarn.resourcemanager.hostname}:8032</value> </property> <property> <description>The address of the scheduler interface.</description> <name>yarn.resourcemanager.scheduler.address</name> <value>${yarn.resourcemanager.hostname}:8030</value> </property> <property> <description>The http address of the RM web application.</description> <name>yarn.resourcemanager.webapp.address</name> <value>${yarn.resourcemanager.hostname}:8088</value> </property> <property> <description>The https adddress of the RM web application.</description> <name>yarn.resourcemanager.webapp.https.address</name> <value>${yarn.resourcemanager.hostname}:8090</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>${yarn.resourcemanager.hostname}:8031</value> </property> <property> <description>The address of the RM admin interface.</description> <name>yarn.resourcemanager.admin.address</name> <value>${yarn.resourcemanager.hostname}:8033</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>1024</value> <discription>每个节点可用内存,单位MB,默认8182MB</discription> </property> <property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>2.1</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>1024</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> </configuration>
-
-
启动
hadoop
脚本格式化脚本
cd /usr/local/hadoop/bin sudo ./hadoop namenode -format
启动脚本
cd /usr/local/hadoop/sbin ./start-all.sh
在win10下,虚拟机master ip+9870 port 比如,其中192.168.8.6是master的ip 192.168.8.6:9870
6、Hadoop
系统学习
参考资料:https://www.zhihu.com/question/333417513
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。