启动zk: zkServer.sh start
启动kafka:kafka-server-start.sh $KAFKA_HOME/config/server.properties
创建一个topic:kafka-topics.sh --create --zookeeper node1:2181 --replication-factor 1 --partitions 1 --topic test
启动一个生产者:kafka-console-producer.sh --broker-list node1:9092 --topic test
运行代码测试:
package com.lin.spark import org.apache.kafka.common.serialization.StringDeserializer import org.apache.spark.SparkConf import org.apache.spark.rdd.RDD import org.apache.spark.streaming.{Seconds, StreamingContext} import org.apache.spark.streaming.kafka010._ import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe /** * Created by Administrator on 2019/6/7. */ object Halo { def main(args: Array[String]): Unit = { val kafkaParams = Map[String, Object]( "bootstrap.servers" -> "node1:9092", "key.deserializer" -> classOf[StringDeserializer], "value.deserializer" -> classOf[StringDeserializer], "group.id" -> "use_a_separate_group_id_for_each_stream", "auto.offset.reset" -> "latest", "enable.auto.commit" -> (true: java.lang.Boolean) ) val conf = new SparkConf().setAppName("Halo").setMaster("local[2]") val ssc = new StreamingContext(conf,Seconds(5)) val topics = Array("test") val stream = KafkaUtils.createDirectStream[String, String]( ssc, PreferConsistent, Subscribe[String, String](topics, kafkaParams) ) stream.foreachRDD(rdd => { val offsetRange = rdd.asInstanceOf[HasOffsetRanges].offsetRanges val maped: RDD[(String, String)] = rdd.map(record => (record.key,record.value)) //计算逻辑 maped.foreach(println) //循环输出 for(o <- offsetRange){ println(s"${o.topic} ${o.partition} ${o.fromOffset} ${o.untilOffset}") } }) ssc.start() ssc.awaitTermination() } }
参考:
http://spark.apache.org/docs/2.2.0/streaming-kafka-0-10-integration.html
https://cloud.tencent.com/developer/article/1355430
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。