微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

RDD转化成类型的方式进行访问

1)创建一个样例类

scala> case class People(name:String,age:Long)
defined class People

2)创建DataSet

scala> val caseClassDS = Seq(People("Andy",32)).toDS()
caseClassDS: org.apache.spark.sql.Dataset[People] = [name: string, age: bigint]

这样people不仅仅有类型,而且还有了结构,这样用起来会更加方便一些。

3)caseClassDS.你会发现这里有很多种方法,也可以show,也可以limit等等。

scala> caseClassDS.
agg describe intersect reduce toDF
alias distinct isLocal registerTempTable toJSON
apply drop isstreaming repartition toJavaRDD
as dropDuplicates javaRDD rollup toLocalIterator
cache dtypes join sample toString
checkpoint except joinWith schema transform
coalesce explain limit select union
col explode map selectExpr unionAll
collect filter mapPartitions show unpersist
collectAsList first na sort where
columns flatMap orderBy sortWithinPartitions withColumn
count foreach persist sparkSession withColumnRenamed
createGlobalTempView foreachPartition printSchema sqlContext withWatermark
createOrReplaceTempView groupBy queryExecution stat write
createTempView groupByKey randomSplit storageLevel writeStream
crossJoin head randomSplitAsList take
cube inputFiles rdd takeAsList

这里和dataframe差不多的。

4)这里我们用createGlobalTempView试一试

scala> caseClassDS.createGlobalTempView("People")

5)好了,我们这是时候想想,可以用spark.SQL查询,select语句直接查询查询内容即可。

scala> spark.sql("select * from global_temp.People").show()
+----+---+
|name|age|
+----+---+
|Andy| 32|
+----+---+

 

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。

相关推荐