Spark[01]Spark集群安装以及配置
目录
准备环境
Hadoop2.0
准备多台虚拟机并配置hadoop2.0环境,安装hadoop2.0和zookeeper,启动DFS和ZKFC服务
详见链接:Hadoop[03]启动DFS和Zookeeper(Hadoop2.0)
设置时间同步
详见链接:Linux(CentOS6)设置同步网络时间
创建hadoop链接,方便后续hadoop命令语句操作
虚拟机①、②、③
ln -sf /root/hadoop-2.6.5/bin/hadoop /root/hadoop-2.6.5/sbin/hadoop
虚拟机①、②、③
多台虚拟机部分数据如下
编号 | 主机名 | 主机域名 | ip地址 |
---|---|---|---|
① | Toozky | Toozky | 192.168.64.220 |
② | Toozky2 | Toozky2 | 192.168.64.221 |
③ | Toozky3 | Toozky3 | 192.168.64.222 |
资源列表
软件/工具 | 版本 |
---|---|
VMware | VMware® Workstation 16 Pro |
Xshell | 6 |
filezilla | 3.7.3 |
spark | spark-2.1.1-bin-hadoop2.6.tgz |
一、spark集群配置
解压spark
虚拟机①
cd
tar -zxvf spark-2.1.1-bin-hadoop2.6.tgz
ln -sf /root/spark-2.1.1-bin-hadoop2.6 /home/spark2.1.1
修改配置
slaves
虚拟机①
cd /home/spark2.1.1/conf/
cp slaves.template slaves
vi slaves
按insert编辑,将
localhost
改为
Toozky
Toozky2
Toozky3
spark-env.sh
虚拟机①
cp spark-env.sh.template spark-env.sh
vi spark-env.sh
G end insert enter
export JAVA_HOME=/root/jdk1.8.0_192
SPARK_MASTER_HOST=Toozky
SPARK_MASTER_PORT=7077
profile
虚拟机①
vi /etc/profile
G end insert enter
#Spark
export SPARK_HOME=/root/spark-2.1.1-bin-hadoop2.6
export PATH=:$PATH:$SPARK_HOME/sbin:$PATH:$SPARK_HOME/bin
esc :wq
刷新环境变量
source /etc/profile
发送相关文件到②、③
虚拟机①
scp /etc/profile root@Toozky2:/etc/profile
scp /etc/profile root@Toozky3:/etc/profile
scp -r /root/spark-2.1.1-bin-hadoop2.6 root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6
scp -r /root/spark-2.1.1-bin-hadoop2.6 root@Toozky3:/root/spark-2.1.1-bin-hadoop2.6
虚拟机②、③
刷新环境变量
source /etc/profile
设置spark-start-all.sh、spark-stop-all.sh链接
cd /home/spark2.1.1/sbin
ln -sf start-all.sh spark-start-all.sh
ln -sf stop-all.sh spark-stop-all.sh
./start-all.sh == spark-start-all.sh
spark首页
http://Toozky:8080/
启动顺序
虚拟机①、②、③
zkServer.sh start
虚拟机①
cd
start-all.sh
spark-start-all.sh
关闭顺序
虚拟机①
cd
spark-stop-all.sh
stop-all.sh
虚拟机①、②、③
zkServer.sh stop
测试运行计算PI程序
本地模式(local)
虚拟机①
spark-submit \
--class org.apache.spark.examples.SparkPi \
--master local[1] \
/root/spark-2.1.1-bin-hadoop2.6/examples/jars/spark-examples_2.11-2.1.1.jar \
10
集群模式(standalone)
虚拟机①
spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://Toozky:7077 \
/root/spark-2.1.1-bin-hadoop2.6/examples/jars/spark-examples_2.11-2.1.1.jar \
10
二、配置历史服务
准备环境
部署hadoop2.0
时间同步ntpdate time.nist.gov
创建hadoop链接
虚拟机①、②、③
ln -sf /root/hadoop-2.6.5/bin/hadoop /root/hadoop-2.6.5/sbin/hadoop
虚拟机①
创建hdfs目录/directory
用于存储历史日志的hdfs目录
hadoop dfs -mkdir /directory
修改配置
spark-defaults.conf
虚拟机①
cd /home/spark2.1.1/conf/
cp spark-defaults.conf.template spark-defaults.conf
vi spark-defaults.conf
G end insert enter
spark.eventLog.enabled true
spark.eventLog.dir hdfs://Toozky:8020/directory
esc :wq
spark-env.sh
虚拟机①
vi spark-env.sh
G end insert enter
export SPARK_HISTORY_OPTS="
-Dspark.history.ui.port=18080
-Dspark.history.fs.logDirectory=hdfs://Toozky:8020/directory
-Dspark.history.retainedApplications=30"
esc :wq
发送相关文件到②、③
scp spark-defaults.conf root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6/conf/
scp spark-defaults.conf root@Toozky3:/root/spark-2.1.1-bin-hadoop2.6/conf/
scp spark-env.sh root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6/conf/
scp spark-env.sh root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6/conf/
spark首页
http://Toozky:8080/
历史服务首页
http://Toozky:18080/
启动顺序
虚拟机①、②、③
zkServer.sh start
虚拟机①
cd
start-all.sh
spark-start-all.sh
start-history-server.sh
关闭顺序
虚拟机①
cd
stop-history-server.sh
spark-stop-all.sh
stop-all.sh
虚拟机①、②、③
zkServer.sh stop
测试运行计算PI程序
虚拟机①
spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://Toozky:7077 \
/root/spark-2.1.1-bin-hadoop2.6/examples/jars/spark-examples_2.11-2.1.1.jar \
10
三、配置历史服务(高可用)
修改配置
spark-defaults.conf
虚拟机①
cd /home/spark2.1.1/conf/
cp spark-defaults.conf.template spark-defaults.conf
vi spark-defaults.conf
G end insert enter
spark.eventLog.enabled true
spark.eventLog.dir hdfs://dfbz/directory
esc :wq
注意,这里的dfbz
是hadoop安装目录中,
例如 /root/hadoop-2.6.5/etc/hadoop/hdfs-site.xml
指定dfs.nameservices
为dfbz
(可以自定义命名,后续均以dfbz命名,且不再解释)
spark-env.sh
虚拟机①
vi spark-env.sh
G end insert enter
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HISTORY_OPTS="
-Dspark.history.ui.port=18080
-Dspark.history.fs.logDirectory=hdfs://dfbz/directory
-Dspark.history.retainedApplications=30"
esc :wq
发送相关文件到②、③
scp spark-defaults.conf root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6/conf/
scp spark-defaults.conf root@Toozky3:/root/spark-2.1.1-bin-hadoop2.6/conf/
scp spark-env.sh root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6/conf/
scp spark-env.sh root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6/conf/
spark-config.sh
虚拟机①
cd /root/spark-2.1.1-bin-hadoop2.6/sbin
vi spark-config.sh
G end insert enter
export JAVA_HOME=/root/jdk1.8.0_192
esc :wq
发送相关文件到②、③
scp spark-config.sh root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6/sbin
scp spark-config.sh root@Toozky3:/root/spark-2.1.1-bin-hadoop2.6/sbin
spark首页
http://Toozky:8080/
历史服务首页
http://Toozky:18080/
启动顺序
虚拟机①、②、③
zkServer.sh start
虚拟机①
cd
start-all.sh
spark-start-all.sh
start-history-server.sh
关闭顺序
虚拟机①
cd
stop-history-server.sh
spark-stop-all.sh
stop-all.sh
虚拟机①、②、③
zkServer.sh stop
测试运行计算PI程序
虚拟机①
spark-submit \
--class org.apache.spark.examples.SparkPi \
--master hdfs://dfbz \
/root/spark-2.1.1-bin-hadoop2.6/examples/jars/spark-examples_2.11-2.1.1.jar \
10
四、配置高可用
修改配置
spark-env.sh
虚拟机①
cd /home/spark2.1.1/conf
vi spark-env.sh
insert
注释SPARK_MASTER_HOST
、SPARK_MASTER_PORT
行
添加如下内容
#SPARK_MASTER_HOST=Toozky
#SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8989
export SPARK_DAEMON_JAVA_OPTS="
-Dspark.deploy.recoveryMode=ZOOKEEPER
-Dspark.deploy.zookeeper.url=Toozky,Toozky2,Toozky3
-Dspark.deploy.zookeeper.dir=/spark"
esc :wq
发送相关文件到②、③
scp spark-env.sh root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6/conf/
scp spark-env.sh root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6/conf/
添加启动/关闭master的链接
虚拟机②
cd /home/spark2.1.1/sbin
ln -sf start-master.sh spark-start-master.sh
ln -sf stop-master.sh spark-stop-master.sh
spark首页
http://Toozky:8989/
http://Toozky2:8989/
历史服务首页
http://Toozky:18080/
启动顺序
虚拟机①、②、③
zkServer.sh start
虚拟机①
cd
start-all.sh
spark-start-all.sh
虚拟机②
spark-start-master.sh
关闭顺序
虚拟机②
spark-stop-master.sh
虚拟机①
cd
spark-stop-all.sh
stop-all.sh
虚拟机①、②、③
zkServer.sh stop
测试运行计算PI程序
虚拟机①
spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://Toozky:7077,Toozky2:7077 \
/root/spark-2.1.1-bin-hadoop2.6/examples/jars/spark-examples_2.11-2.1.1.jar \
10
五、yarn模式配置
修改配置
yarn-site.xml
虚拟机①
cd /home/hadoop2.6/etc/hadoop
vi yarn-site.xml
<!--是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<!--是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
esc :wq
发送相关文件到②、③
scp yarn-site.xml root@Toozky2:/root/hadoop-2.6.5/etc/hadoop
scp yarn-site.xml root@Toozky3:/root/hadoop-2.6.5/etc/hadoop
spark-env.sh
虚拟机①
cd /home/spark2.1.1/conf
vi spark-env.sh
insert
export JAVA_HOME=/root/jdk1.8.0_192
YARN_CONF_DIR=/root/hadoop-2.6.5/etc/hadoop
esc :wq
spark-defaults.conf
虚拟机①
vi spark-defaults.conf
insert
spark.yarn.historyServer.address=dfbz
spark.history.ui.port=18080
esc :wq
发送相关文件到②、③
scp spark-env.sh root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6/conf/
scp spark-env.sh root@Toozky3:/root/spark-2.1.1-bin-hadoop2.6/conf/
scp spark-defaults.conf root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6/conf/
scp spark-defaults.conf root@Toozky3:/root/spark-2.1.1-bin-hadoop2.6/conf/
添加启动/关闭master的链接
虚拟机②
cd /home/spark2.1.1/sbin
ln -sf start-master.sh spark-start-master.sh
ln -sf stop-master.sh spark-stop-master.sh
查看进程页面
http://Toozky:8088/
历史服务首页
http://Toozky:18080/
启动顺序
虚拟机①、②、③
zkServer.sh start
虚拟机①
cd
start-all.sh
start-history-server.sh
关闭顺序
虚拟机①
cd
stop-history-server.sh
stop-all.sh
虚拟机①、②、③
zkServer.sh stop
测试运行计算PI程序
虚拟机①
spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
--total-executor-cores 2 \
/root/spark-2.1.1-bin-hadoop2.6/examples/jars/spark-examples_2.11-2.1.1.jar \
10
补充:local、standalone、yarn模式依附进程对比
模式 | Spark安装机器数 | 需启动的进程 | 所属者 | 应用场景 |
---|---|---|---|---|
Local | 1 | 无 | Spark | 测试 |
Standalone | n | Master、Worker | Spark | 单独部署 |
Yarn | 1 | Yarn、Hdfs | Hadoop | 混合部署 |
以上就是本期总结的全部内容,愿大家相互学习,共同进步!
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。