微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Spark[01]Spark集群安装以及配置

Spark[01]Spark集群安装以及配置

目录


准备环境

Hadoop2.0

准备多台虚拟机并配置hadoop2.0环境,安装hadoop2.0和zookeeper,启动DFS和ZKFC服务
详见链接Hadoop[03]启动DFS和Zookeeper(Hadoop2.0)

设置时间同步
详见链接Linux(CentOS6)设置同步网络时间

创建hadoop链接,方便后续hadoop命令语句操作

虚拟机①、②、③

ln -sf /root/hadoop-2.6.5/bin/hadoop /root/hadoop-2.6.5/sbin/hadoop

虚拟机①、②、③

多台虚拟机部分数据如下

编号主机名主机域名ip地址
ToozkyToozky192.168.64.220
Toozky2Toozky2192.168.64.221
Toozky3Toozky3192.168.64.222

资源列表

软件/工具版本
VMwareVMware® Workstation 16 Pro
Xshell6
filezilla3.7.3
sparkspark-2.1.1-bin-hadoop2.6.tgz

一、spark集群配置

解压spark

虚拟机①

cd
tar -zxvf spark-2.1.1-bin-hadoop2.6.tgz
ln -sf /root/spark-2.1.1-bin-hadoop2.6 /home/spark2.1.1

修改配置

slaves

虚拟机①

cd /home/spark2.1.1/conf/
cp slaves.template slaves
vi slaves

按insert编辑,将
localhost改为

Toozky
Toozky2
Toozky3

按esc退出编辑,输入:wq保存退出

spark-env.sh

虚拟机①

cp spark-env.sh.template spark-env.sh
vi spark-env.sh

G end insert enter

export JAVA_HOME=/root/jdk1.8.0_192
SPARK_MASTER_HOST=Toozky
SPARK_MASTER_PORT=7077

按esc退出编辑,输入:wq保存退出

profile

虚拟机①

vi /etc/profile

G end insert enter

#Spark
export SPARK_HOME=/root/spark-2.1.1-bin-hadoop2.6
export PATH=:$PATH:$SPARK_HOME/sbin:$PATH:$SPARK_HOME/bin

esc :wq
刷新环境变量

source /etc/profile

发送相关文件到②、③

虚拟机①

scp /etc/profile root@Toozky2:/etc/profile
scp /etc/profile root@Toozky3:/etc/profile
scp -r /root/spark-2.1.1-bin-hadoop2.6 root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6
scp -r /root/spark-2.1.1-bin-hadoop2.6 root@Toozky3:/root/spark-2.1.1-bin-hadoop2.6

虚拟机②、③

刷新环境变量

source /etc/profile

设置spark-start-all.sh、spark-stop-all.sh链接

cd /home/spark2.1.1/sbin
ln -sf start-all.sh spark-start-all.sh
ln -sf stop-all.sh spark-stop-all.sh

./start-all.sh  ==  spark-start-all.sh

spark首页
http://Toozky:8080/

启动顺序

虚拟机①、②、③

zkServer.sh start

虚拟机①

cd
start-all.sh
spark-start-all.sh

关闭顺序

虚拟机①

cd
spark-stop-all.sh
stop-all.sh

虚拟机①、②、③

zkServer.sh stop

测试运行计算PI程序

本地模式(local)

虚拟机①

spark-submit \
--class org.apache.spark.examples.SparkPi \
--master local[1] \
/root/spark-2.1.1-bin-hadoop2.6/examples/jars/spark-examples_2.11-2.1.1.jar \
10

集群模式(standalone)

虚拟机①

spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://Toozky:7077 \
/root/spark-2.1.1-bin-hadoop2.6/examples/jars/spark-examples_2.11-2.1.1.jar \
10

二、配置历史服务

准备环境

部署hadoop2.0

时间同步ntpdate time.nist.gov

创建hadoop链接

虚拟机①、②、③

ln -sf /root/hadoop-2.6.5/bin/hadoop /root/hadoop-2.6.5/sbin/hadoop

虚拟机①

创建hdfs目录/directory
用于存储历史日志的hdfs目录

hadoop dfs -mkdir /directory

修改配置

spark-defaults.conf

虚拟机①

cd /home/spark2.1.1/conf/
cp spark-defaults.conf.template spark-defaults.conf
vi spark-defaults.conf

G end insert enter

spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://Toozky:8020/directory

esc :wq

spark-env.sh

虚拟机①

vi spark-env.sh

G end insert enter

export SPARK_HISTORY_OPTS="
-Dspark.history.ui.port=18080
-Dspark.history.fs.logDirectory=hdfs://Toozky:8020/directory
-Dspark.history.retainedApplications=30"

esc :wq

发送相关文件到②、③

scp spark-defaults.conf root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6/conf/
scp spark-defaults.conf root@Toozky3:/root/spark-2.1.1-bin-hadoop2.6/conf/
scp spark-env.sh root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6/conf/
scp spark-env.sh root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6/conf/

spark首页
http://Toozky:8080/
历史服务首页
http://Toozky:18080/

启动顺序

虚拟机①、②、③

zkServer.sh start

虚拟机①

cd
start-all.sh
spark-start-all.sh
start-history-server.sh

关闭顺序

虚拟机①

cd
stop-history-server.sh
spark-stop-all.sh
stop-all.sh

虚拟机①、②、③

zkServer.sh stop

测试运行计算PI程序

虚拟机①

spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://Toozky:7077 \
/root/spark-2.1.1-bin-hadoop2.6/examples/jars/spark-examples_2.11-2.1.1.jar \
10

三、配置历史服务(高可用)

修改配置

spark-defaults.conf

虚拟机①

cd /home/spark2.1.1/conf/
cp spark-defaults.conf.template spark-defaults.conf
vi spark-defaults.conf

G end insert enter

spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://dfbz/directory

esc :wq

注意,这里的dfbz是hadoop安装目录中,
例如 /root/hadoop-2.6.5/etc/hadoop/hdfs-site.xml

指定dfs.nameservicesdfbz(可以自定义命名,后续均以dfbz命名,且不再解释)

spark-env.sh

虚拟机①

vi spark-env.sh

G end insert enter

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export SPARK_HISTORY_OPTS="
-Dspark.history.ui.port=18080
-Dspark.history.fs.logDirectory=hdfs://dfbz/directory
-Dspark.history.retainedApplications=30"

esc :wq

发送相关文件到②、③

scp spark-defaults.conf root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6/conf/
scp spark-defaults.conf root@Toozky3:/root/spark-2.1.1-bin-hadoop2.6/conf/
scp spark-env.sh root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6/conf/
scp spark-env.sh root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6/conf/

spark-config.sh

虚拟机①

cd /root/spark-2.1.1-bin-hadoop2.6/sbin
vi spark-config.sh

G end insert enter

export JAVA_HOME=/root/jdk1.8.0_192

esc :wq

发送相关文件到②、③

scp spark-config.sh root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6/sbin
scp spark-config.sh root@Toozky3:/root/spark-2.1.1-bin-hadoop2.6/sbin

spark首页
http://Toozky:8080/
历史服务首页
http://Toozky:18080/

启动顺序

虚拟机①、②、③

zkServer.sh start

虚拟机①

cd
start-all.sh
spark-start-all.sh
start-history-server.sh

关闭顺序

虚拟机①

cd
stop-history-server.sh
spark-stop-all.sh
stop-all.sh

虚拟机①、②、③

zkServer.sh stop

测试运行计算PI程序

虚拟机①

spark-submit \
--class org.apache.spark.examples.SparkPi \
--master hdfs://dfbz \
/root/spark-2.1.1-bin-hadoop2.6/examples/jars/spark-examples_2.11-2.1.1.jar \
10

四、配置高可用

修改配置

spark-env.sh

虚拟机①

cd /home/spark2.1.1/conf
vi spark-env.sh

insert
注释SPARK_MASTER_HOSTSPARK_MASTER_PORT
添加如下内容

#SPARK_MASTER_HOST=Toozky
#SPARK_MASTER_PORT=7077

SPARK_MASTER_WEBUI_PORT=8989
export SPARK_DAEMON_JAVA_OPTS="
-Dspark.deploy.recoveryMode=ZOOKEEPER
-Dspark.deploy.zookeeper.url=Toozky,Toozky2,Toozky3
-Dspark.deploy.zookeeper.dir=/spark"

esc :wq

发送相关文件到②、③

scp spark-env.sh root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6/conf/
scp spark-env.sh root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6/conf/

添加启动/关闭master的链接

虚拟机②

cd /home/spark2.1.1/sbin
ln -sf start-master.sh spark-start-master.sh
ln -sf stop-master.sh spark-stop-master.sh

spark首页
http://Toozky:8989/
http://Toozky2:8989/
历史服务首页
http://Toozky:18080/

启动顺序

虚拟机①、②、③

zkServer.sh start

虚拟机①

cd
start-all.sh
spark-start-all.sh

虚拟机②

spark-start-master.sh

关闭顺序

虚拟机②

spark-stop-master.sh

虚拟机①

cd
spark-stop-all.sh
stop-all.sh

虚拟机①、②、③

zkServer.sh stop

测试运行计算PI程序

虚拟机①

spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://Toozky:7077,Toozky2:7077 \
/root/spark-2.1.1-bin-hadoop2.6/examples/jars/spark-examples_2.11-2.1.1.jar \
10

五、yarn模式配置

修改配置

yarn-site.xml

虚拟机①

cd /home/hadoop2.6/etc/hadoop
vi yarn-site.xml

insert
configuration标签添加

  <!--是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,认是true -->
  <property>
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
  </property>
  <!--是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,认是true -->
  <property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
  </property>

esc :wq

发送相关文件到②、③

scp yarn-site.xml root@Toozky2:/root/hadoop-2.6.5/etc/hadoop
scp yarn-site.xml root@Toozky3:/root/hadoop-2.6.5/etc/hadoop

spark-env.sh

虚拟机①

cd /home/spark2.1.1/conf
vi spark-env.sh

insert

export JAVA_HOME=/root/jdk1.8.0_192
YARN_CONF_DIR=/root/hadoop-2.6.5/etc/hadoop

esc :wq

spark-defaults.conf

虚拟机①

vi spark-defaults.conf

insert

spark.yarn.historyServer.address=dfbz
spark.history.ui.port=18080

esc :wq

发送相关文件到②、③

scp spark-env.sh root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6/conf/
scp spark-env.sh root@Toozky3:/root/spark-2.1.1-bin-hadoop2.6/conf/
scp spark-defaults.conf root@Toozky2:/root/spark-2.1.1-bin-hadoop2.6/conf/
scp spark-defaults.conf root@Toozky3:/root/spark-2.1.1-bin-hadoop2.6/conf/

添加启动/关闭master的链接

虚拟机②

cd /home/spark2.1.1/sbin
ln -sf start-master.sh spark-start-master.sh
ln -sf stop-master.sh spark-stop-master.sh

查看进程页面
http://Toozky:8088/
历史服务首页
http://Toozky:18080/

启动顺序

虚拟机①、②、③

zkServer.sh start

虚拟机①

cd
start-all.sh
start-history-server.sh

关闭顺序

虚拟机①

cd
stop-history-server.sh
stop-all.sh

虚拟机①、②、③

zkServer.sh stop

测试运行计算PI程序

虚拟机①

spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
--total-executor-cores 2 \
/root/spark-2.1.1-bin-hadoop2.6/examples/jars/spark-examples_2.11-2.1.1.jar \
10

补充:local、standalone、yarn模式依附进程对比

模式Spark安装机器数需启动的进程所属者应用场景
Local1Spark测试
StandalonenMaster、WorkerSpark单独部署
Yarn1Yarn、HdfsHadoop混合部署

以上就是本期总结的全部内容,愿大家相互学习,共同进步!

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。

相关推荐