spark创建dataFrame方式有很多种,官方API也比较多,可能用的最多的是通过textFile()读取数据源吧
但是业务上的个别场景,不适合用textFile(),于是使用下面的这个API
/** * Applies a schema to a List of Java Beans. * * WARNING: Since there is no guaranteed ordering for fields in a Java Bean, * SELECT * queries will return the columns in an undefined order. * @since 1.6.0 */ def createDataFrame(data: java.util.List[_], beanClass: Class[_]): DataFrame = { val attrSeq = getSchema(beanClass) val rows = sqlContext.beansToRows(data.asScala.iterator, beanClass, attrSeq) Dataset.ofRows(self, LocalRelation(attrSeq, rows.toSeq)) }
样例部分代码:
ArrayList<GciGri> list = new ArrayList<GciGri>(); GciGri g = new GciGri(); g.setGci((gci)); g.setGri((gri)); list.add(g); spark.createDataFrame(list, GciGri.class).createOrReplaceTempView("testtesttest");
package cn.com.dtmobile.test; import java.io.Serializable; public class GciGri implements Serializable { private static final long serialVersionUID = 1L; private int Gci; private int Gri; public int getGci() { return Gci; } public void setGci(int gci) { Gci = gci; } public int getGri() { return Gri; } public void setGri(int gri) { Gri = gri; } }
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。