微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

spark-shell 启动设置动态分区,snappy压缩、parquet存储以及备份

1、spark-shell 启动设置动态分区

  --executor-memory 16G \
  --total-executor-cores 10 \ 
  --executor-cores 10 \
  --conf "spark.hadoop.hive.exec.dynamic.partition=true" \
  --conf "spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict" 
  --conf spark.sql.shuffle.partitions=10 \
  --conf spark.default.parallelism=10 \

2、spark-sql对表压缩及备份

val sqlContext = new org.apache.spark.sqlContext(sc);
import org.apache.hadoop.conf.Configuration
import org.apache.fs.{FileSystem, FileUtil, Path ,FileStatus}
import scala.collection.mutable.{ArrayBuffer, ListBuffer}
import scala.io.source
import java.io.PrintWrite

val tbn = "src_es"
val tbn = Array("middata","decision_info")

for (tb <- tbn){
    println(dbn+"."+tb)
    val df = sqlContext.sql("select * from "+dbn+"."+tb)
    df.write.option("compression","snappy").format("parquet")
    .save("/backupdatafile/"+dbn+".db/"+tb)
    val dbtb = spark.read.parquet("/backupdatafile/"+dbn+".db/"+tb)
    dbtb.createOrReplaceTempView("test_"+tb)
    spark.sql("insert overwrite table "+dbn+"."+tb+" select * from test_"+tb);
}

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。

相关推荐