使用spark sql的thrift jdbc接口查询数据时报这个错误
Exception in thread "main" java.sql.sqlException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3107 in stage 308.0 Failed 4 times, most recent failure: Lost task 3107.3 in stage 308.0 (TID 620318, XXX): org.apache.spark.SparkException: Kryo serialization Failed: Buffer overflow. Available: 1572864, required: 3236381
Serialization trace:
values (org.apache.spark.sql.catalyst.expressions.GenericInternalRow). To avoid this, increase spark.kryoserializer.buffer.max value.
at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:299)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:240)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:275)
at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:355)
at com.peopleyuqing.tool.SparkJDBC.excuteQuery(SparkJDBC.java:64)
at com.peopleyuqing.main.ContentSubThree.main(ContentSubThree.java:24)
提示需要调整的参数是spark.kryoserializer.buffer.max,最少是需要3236381
一开始设置是在spark-default.conf文件里面配置
spark.kryoserializer.buffer.max=64m
spark.kryoserializer.buffer=64k
错误依旧而且Available变成了0
Exception in thread "main" java.sql.sqlException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3155 in stage 0.0 Failed 4 times, most recent failure: Lost task 3155.3 in stage 0.0 (TID 3317, XXX): org.apache.spark.SparkException: Kryo serialization Failed: Buffer overflow. Available: 0, required: 615328
Serialization trace:
values (org.apache.spark.sql.catalyst.expressions.GenericInternalRow). To avoid this, increase spark.kryoserializer.buffer.max value.
at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:299)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:240)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
后来通过spark-shell测试
sc.getConf.get(“spark.kryoserializer.buffer.max”)发现返回是设置的值64M
说明spark-sql thrift jdbc的配置不收spark-conf.default的影响,遂修改配置方式,增加了这样两个启动参数
--conf spark.kryoserializer.buffer.max=256m --conf spark.kryoserializer.buffer=64m
启动命令如下
sbin/start-thriftserver.sh --executor-memory 10g --driver-memory 12g --total-executor-cores 288 --executor-cores 2 --conf spark.kryoserializer.buffer.max=256m --conf spark.kryoserializer.buffer=64m
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。