微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Spark SQL 数据源 parquet文件

scala> val employee = sqlparquet.read.json("employee.json") 这里将txt转化为parquet应该也行
employee: org.apache.spark.sql.DataFrame = [_corrupt_record: string, age: string ... 2 more fields]

scala> employee.write.parquet("employee.parquet")
                                                                                
scala> val sqlpar = new org.apache.spark.sql.sqlContext(sc)
warning: one deprecation (since 2.0.0); for details, enable `:setting -deprecation' or `:replay -deprecation'
sqlpar: org.apache.spark.sql.sqlContext = org.apache.spark.sql.sqlContext@4bdf398a

scala> val parread = sqlpar.read.parquet("employee.parquet")
parread: org.apache.spark.sql.DataFrame = [_corrupt_record: string, age: string ... 2 more fields]

scala> parread.show()
此处虽然可以输出但是没在表中,这里属于parquet文件读取
scala> val allcol = sqlpar.sql("SELECT * FROM Demo")
allcol: org.apache.spark.sql.DataFrame = [_corrupt_record: string, age: string ... 2 more fields]
scala> val allcol = sqlpar.sql("SELECT id,age,name FROM Demo")
allcol: org.apache.spark.sql.DataFrame = [id: string, age: string ... 1 more field]

scala> allcol.show()
+----+----+-------+
|  id| age|   name|
+----+----+-------+
|null|null|   null|
|1201|  25| satish|
|1202|  28|krishna|
|1203|  39|  amith|
|1204|  23|  javed|
|1205|  23| prudvi|
|null|null|   null|
+----+----+-------+
此处为存在临时表中用sql读表

后续补充json. hive. paruqet三种数据源优缺点

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。

相关推荐