1.MapReduce定义
MapReduce核心功能是将用户编写的业务逻辑代码和自带默认组件整合成一个完整的分布式运算程序,并行运行在一个Hadoop集群上。
2.MapReduce优缺点
优点
1)MapReduce 易于编程
它简单的实现一些接口,就可以完成一个分布式程序,这个分布式程序可以分布到大量廉价的PC机器上运行。也就是说你写一个分布式程序,跟写一个简单的串行程序是一模一样的。就是因为这个特点使得MapReduce编程变得非常流行。
实现其内部接口+组件,即可开发分布式程序
2)良好的扩展性
3)高容错性
MapReduce设计的初衷就是使程序能够部署在廉价的PC机器上,这就要求它具有很高的容错性。比如其中一台机器挂了,它可以把上面的计算任务转移到另外一个节点上运行,不至于这个任务运行失败,而且这个过程不需要人工参与,而完全是由Hadoop内部完成的。
体现在集群上的多个副本
4)适合PB级以上海量数据的离线处理
可以实现上千台服务器集群并发工作,提供数据处理能力。
海量数据的离线处理,是Hadoop的核心功能;
缺点
1)不擅长实时计算
MapReduce无法像MysqL一样,在毫秒或者秒级内返回结果。
因为其需要不断的落盘与IO,所以会慢。但不是说它一定不能做实时计算,只是不擅长;
2)不擅长流式计算
流式计算的输入数据是动态的,而MapReduce的输入数据集是静态的,不能动态变化。这是因为MapReduce自身的设计特点决定了数据源必须是静态的。
3)不擅长DAG(有向无环图)计算
多个应用程序存在依赖关系,后一个应用程序的输入为前一个的输出。在这种情况下,MapReduce并不是不能做,而是使用后,每个MapReduce作业的输出结果都会写入到磁盘,会造成大量的磁盘IO,导致性能非常的低下。
3.MapReduce的核心思想
1)MapReduce分布式运行程序往往需要分成2个阶段进行执行。
2)第一个阶段的MapTask并发实例,完全并行运行,互不干扰。
3)第二个阶段的ReduceTask并发实例互不干扰,但是他的入参数据依赖上一个阶段的所有MapTask并发实例的输出。
4)MapReduce编程模式只能包含一个Map阶段和一个Reduce阶段,如果用户的业务逻辑非常复杂,那就只能多个MapReduce程序,串行运行。
4.MapReduce进程包括哪些?
一个完整的MapReduce程序在分布式运行时,有三个实例进程:
1)MrAppMaster:负责整个程序的过程调度及状态协调
2)MapTask:负责Map阶段的整个数据处理流程
3)ReduceTask:负责Reduce阶段的整个数据处理流程
5.第一个案例
WordCount案例,有三个主要的类:自定义Mapper类,自定义Reducer和Driver驱动类组成;具体实现见下面案例;
6.Java序列化与Hadoop序列化类型对比
7.MapReduce编程规范
用户编写的程序分成3部分:Mapper、Reducer和Driver。
1.Mapper阶段
1)自定义Mapper类继承Hadoop的Mapper父类
2)Mapper的输入数据时KV,一般读取文本数据用LongWritable,Text
3) Mapper中的业务写在map方法中,MapTask每读取一次数据,就会调用此方法一次
4)map方法执行完成,通过上下文写出加工后的KV值。
2.Reduce阶段
1)自定义Reducer类继承Hadoop的Reducer父类
2)Reducer的输入数据类型为KV,对应Mapper的输出KV
3)Reducer中的业务写在reduce方法中,是ReduceTask调用的,对相同key不同值的一组数据的处理
4)reduce方法执行完成,通过上下文写出加工后的KV值。
3.Driver阶段
相当于Yarn集群的客户端,用于提交我们整个程序到Yarn集群,提交的是封装了MapReduce程序相关运行参数的Job对象。
8.WordCount案例演示
1)需求
输入数据
fengxq fengxq
ss ss
cls cls
jiao
banzhang
xue
hadoop
sgg sgg sgg
nihao nihao
bigdata0111
laiba
期望输出
banzhang 1
bigdata0111 1
cls 2
fengxq 2
hadoop 1
jiao 1
laiba 1
nihao 2
sgg 3
ss 2
xue 1
2)代码
自定义WordCountMapper 类
package com.fengxq.mr.wc;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
/**
* 继承Mapper类
*/
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private Text outk = new Text();
private IntWritable outv = new IntWritable(1); // 一次
/**
* 写map方法
* @param key
* @param value
* @param context
* @throws IOException
* @throws InterruptedException
*/
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
// 读取行数据
String line = value.toString();
/**
* 切割
* 格式:fengxq fengxq
*/
String[] datas = line.split(" ");
// 循环获取key value,并输出
for (String data : datas) {
outk.set(data);
context.write(outk, outv);
}
}
}
自定义WordCountReducer类
package com.fengxq.mr.wc;
import jdk.nashorn.internal.ir.CallNode;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
IntWritable outv = new IntWritable();
/**
* 对已经分组好的键值进行输出
* @param key
* @param values
* @param context
* @throws IOException
* @throws InterruptedException
*/
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0; // 总数
for (IntWritable value : values) {
sum += value.get();
}
outv.set(sum);
context.write(key, outv);
}
}
自定义WordCountDriver类
package com.fengxq.mr.wc;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class WordCountDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Job job = Job.getInstance();
job.setJarByClass(WordCountDriver.class);
job.setMapperClass(WordCountMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setReducerClass(WordCountReducer.class);
job.setoutputKeyClass(Text.class);
job.setoutputValueClass(IntWritable.class);
// FileInputFormat.setInputPaths(job, new Path(args[0]));
// FileOutputFormat.setoutputPath(job, new Path(args[1]));
FileInputFormat.setInputPaths(job, new Path("D:\\hadoop_in\\wcinput"));
FileOutputFormat.setoutputPath(job, new Path("D:\\hadoop_out\\wcinput_out"));
job.waitForCompletion(true);
}
}
执行日志
C:\instal\java\jdk8\bin\java.exe "-javaagent:C:\instal\JetBrains\IntelliJ IDEA 2019.3.1\lib\idea_rt.jar=5998:C:\instal\JetBrains\IntelliJ IDEA 2019.3.1\bin" -Dfile.encoding=UTF-8 -classpath C:\instal\java\jdk8\jre\lib\charsets.jar;C:\instal\java\jdk8\jre\lib\deploy.jar;C:\instal\java\jdk8\jre\lib\ext\access-bridge-64.jar;C:\instal\java\jdk8\jre\lib\ext\cldrdata.jar;C:\instal\java\jdk8\jre\lib\ext\dnsns.jar;C:\instal\java\jdk8\jre\lib\ext\jaccess.jar;C:\instal\java\jdk8\jre\lib\ext\jfxrt.jar;C:\instal\java\jdk8\jre\lib\ext\localedata.jar;C:\instal\java\jdk8\jre\lib\ext\nashorn.jar;C:\instal\java\jdk8\jre\lib\ext\sunec.jar;C:\instal\java\jdk8\jre\lib\ext\sunjce_provider.jar;C:\instal\java\jdk8\jre\lib\ext\sunmscapi.jar;C:\instal\java\jdk8\jre\lib\ext\sunpkcs11.jar;C:\instal\java\jdk8\jre\lib\ext\zipfs.jar;C:\instal\java\jdk8\jre\lib\javaws.jar;C:\instal\java\jdk8\jre\lib\jce.jar;C:\instal\java\jdk8\jre\lib\jfr.jar;C:\instal\java\jdk8\jre\lib\jfxswt.jar;C:\instal\java\jdk8\jre\lib\jsse.jar;C:\instal\java\jdk8\jre\lib\management-agent.jar;C:\instal\java\jdk8\jre\lib\plugin.jar;C:\instal\java\jdk8\jre\lib\resources.jar;C:\instal\java\jdk8\jre\lib\rt.jar;C:\mydisk\ws_bigdata2\MapReduce\target\classes;C:\mydisk\pub_noinstal\RepMaven\junit\junit\4.12\junit-4.12.jar;C:\mydisk\pub_noinstal\RepMaven\org\hamcrest\hamcrest-core\1.3\hamcrest-core-1.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\logging\log4j\log4j-slf4j-impl\2.12.0\log4j-slf4j-impl-2.12.0.jar;C:\mydisk\pub_noinstal\RepMaven\org\slf4j\slf4j-api\1.7.25\slf4j-api-1.7.25.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\logging\log4j\log4j-api\2.12.0\log4j-api-2.12.0.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\logging\log4j\log4j-core\2.12.0\log4j-core-2.12.0.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-client\3.1.3\hadoop-client-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-common\3.1.3\hadoop-common-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\guava\guava\27.0-jre\guava-27.0-jre.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\guava\failureaccess\1.0\failureaccess-1.0.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\guava\listenablefuture\9999.0-empty-to-avoid-conflict-with-guava\listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar;C:\mydisk\pub_noinstal\RepMaven\org\checkerframework\checker-qual\2.5.2\checker-qual-2.5.2.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\errorprone\error_prone_annotations\2.2.0\error_prone_annotations-2.2.0.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\j2objc\j2objc-annotations\1.1\j2objc-annotations-1.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\codehaus\mojo\animal-sniffer-annotations\1.17\animal-sniffer-annotations-1.17.jar;C:\mydisk\pub_noinstal\RepMaven\commons-cli\commons-cli\1.2\commons-cli-1.2.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\commons\commons-math3\3.1.1\commons-math3-3.1.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\httpcomponents\httpclient\4.5.2\httpclient-4.5.2.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\httpcomponents\httpcore\4.4.4\httpcore-4.4.4.jar;C:\mydisk\pub_noinstal\RepMaven\commons-codec\commons-codec\1.11\commons-codec-1.11.jar;C:\mydisk\pub_noinstal\RepMaven\commons-io\commons-io\2.5\commons-io-2.5.jar;C:\mydisk\pub_noinstal\RepMaven\commons-net\commons-net\3.6\commons-net-3.6.jar;C:\mydisk\pub_noinstal\RepMaven\commons-collections\commons-collections\3.2.2\commons-collections-3.2.2.jar;C:\mydisk\pub_noinstal\RepMaven\org\eclipse\jetty\jetty-servlet\9.3.24.v20180605\jetty-servlet-9.3.24.v20180605.jar;C:\mydisk\pub_noinstal\RepMaven\org\eclipse\jetty\jetty-security\9.3.24.v20180605\jetty-security-9.3.24.v20180605.jar;C:\mydisk\pub_noinstal\RepMaven\org\eclipse\jetty\jetty-webapp\9.3.24.v20180605\jetty-webapp-9.3.24.v20180605.jar;C:\mydisk\pub_noinstal\RepMaven\org\eclipse\jetty\jetty-xml\9.3.24.v20180605\jetty-xml-9.3.24.v20180605.jar;C:\mydisk\pub_noinstal\RepMaven\javax\servlet\jsp\jsp-api\2.1\jsp-api-2.1.jar;C:\mydisk\pub_noinstal\RepMaven\com\sun\jersey\jersey-servlet\1.19\jersey-servlet-1.19.jar;C:\mydisk\pub_noinstal\RepMaven\commons-logging\commons-logging\1.1.3\commons-logging-1.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\log4j\log4j\1.2.17\log4j-1.2.17.jar;C:\mydisk\pub_noinstal\RepMaven\commons-lang\commons-lang\2.6\commons-lang-2.6.jar;C:\mydisk\pub_noinstal\RepMaven\commons-beanutils\commons-beanutils\1.9.3\commons-beanutils-1.9.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\commons\commons-configuration2\2.1.1\commons-configuration2-2.1.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\commons\commons-lang3\3.4\commons-lang3-3.4.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\avro\avro\1.7.7\avro-1.7.7.jar;C:\mydisk\pub_noinstal\RepMaven\org\codehaus\jackson\jackson-core-asl\1.9.13\jackson-core-asl-1.9.13.jar;C:\mydisk\pub_noinstal\RepMaven\org\codehaus\jackson\jackson-mapper-asl\1.9.13\jackson-mapper-asl-1.9.13.jar;C:\mydisk\pub_noinstal\RepMaven\com\thoughtworks\paranamer\paranamer\2.3\paranamer-2.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\xerial\snappy\snappy-java\1.0.5\snappy-java-1.0.5.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\re2j\re2j\1.1\re2j-1.1.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\protobuf\protobuf-java\2.5.0\protobuf-java-2.5.0.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\code\gson\gson\2.2.4\gson-2.2.4.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-auth\3.1.3\hadoop-auth-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\com\nimbusds\nimbus-jose-jwt\4.41.1\nimbus-jose-jwt-4.41.1.jar;C:\mydisk\pub_noinstal\RepMaven\com\github\stephenc\jcip\jcip-annotations\1.0-1\jcip-annotations-1.0-1.jar;C:\mydisk\pub_noinstal\RepMaven\net\minidev\json-smart\2.3\json-smart-2.3.jar;C:\mydisk\pub_noinstal\RepMaven\net\minidev\accessors-smart\1.2\accessors-smart-1.2.jar;C:\mydisk\pub_noinstal\RepMaven\org\ow2\asm\asm\5.0.4\asm-5.0.4.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\curator\curator-framework\2.13.0\curator-framework-2.13.0.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\curator\curator-client\2.13.0\curator-client-2.13.0.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\curator\curator-recipes\2.13.0\curator-recipes-2.13.0.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\code\findbugs\jsr305\3.0.0\jsr305-3.0.0.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\htrace\htrace-core4\4.1.0-incubating\htrace-core4-4.1.0-incubating.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\commons\commons-compress\1.18\commons-compress-1.18.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-simplekdc\1.0.1\kerb-simplekdc-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-client\1.0.1\kerb-client-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerby-config\1.0.1\kerby-config-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-core\1.0.1\kerb-core-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerby-pkix\1.0.1\kerby-pkix-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerby-asn1\1.0.1\kerby-asn1-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerby-util\1.0.1\kerby-util-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-common\1.0.1\kerb-common-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-crypto\1.0.1\kerb-crypto-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-util\1.0.1\kerb-util-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\token-provider\1.0.1\token-provider-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-admin\1.0.1\kerb-admin-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-server\1.0.1\kerb-server-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-identity\1.0.1\kerb-identity-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerby-xdr\1.0.1\kerby-xdr-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\com\fasterxml\jackson\core\jackson-databind\2.7.8\jackson-databind-2.7.8.jar;C:\mydisk\pub_noinstal\RepMaven\com\fasterxml\jackson\core\jackson-core\2.7.8\jackson-core-2.7.8.jar;C:\mydisk\pub_noinstal\RepMaven\org\codehaus\woodstox\stax2-api\3.1.4\stax2-api-3.1.4.jar;C:\mydisk\pub_noinstal\RepMaven\com\fasterxml\woodstox\woodstox-core\5.0.3\woodstox-core-5.0.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-hdfs-client\3.1.3\hadoop-hdfs-client-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\com\squareup\okhttp\okhttp\2.7.5\okhttp-2.7.5.jar;C:\mydisk\pub_noinstal\RepMaven\com\squareup\okio\okio\1.6.0\okio-1.6.0.jar;C:\mydisk\pub_noinstal\RepMaven\com\fasterxml\jackson\core\jackson-annotations\2.7.8\jackson-annotations-2.7.8.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-yarn-api\3.1.3\hadoop-yarn-api-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\javax\xml\bind\jaxb-api\2.2.11\jaxb-api-2.2.11.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-yarn-client\3.1.3\hadoop-yarn-client-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-mapreduce-client-core\3.1.3\hadoop-mapreduce-client-core-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-yarn-common\3.1.3\hadoop-yarn-common-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\javax\servlet\javax.servlet-api\3.1.0\javax.servlet-api-3.1.0.jar;C:\mydisk\pub_noinstal\RepMaven\org\eclipse\jetty\jetty-util\9.3.24.v20180605\jetty-util-9.3.24.v20180605.jar;C:\mydisk\pub_noinstal\RepMaven\com\sun\jersey\jersey-core\1.19\jersey-core-1.19.jar;C:\mydisk\pub_noinstal\RepMaven\javax\ws\rs\jsr311-api\1.1.1\jsr311-api-1.1.1.jar;C:\mydisk\pub_noinstal\RepMaven\com\sun\jersey\jersey-client\1.19\jersey-client-1.19.jar;C:\mydisk\pub_noinstal\RepMaven\com\fasterxml\jackson\module\jackson-module-jaxb-annotations\2.7.8\jackson-module-jaxb-annotations-2.7.8.jar;C:\mydisk\pub_noinstal\RepMaven\com\fasterxml\jackson\jaxrs\jackson-jaxrs-json-provider\2.7.8\jackson-jaxrs-json-provider-2.7.8.jar;C:\mydisk\pub_noinstal\RepMaven\com\fasterxml\jackson\jaxrs\jackson-jaxrs-base\2.7.8\jackson-jaxrs-base-2.7.8.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-mapreduce-client-jobclient\3.1.3\hadoop-mapreduce-client-jobclient-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-mapreduce-client-common\3.1.3\hadoop-mapreduce-client-common-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-annotations\3.1.3\hadoop-annotations-3.1.3.jar com.fengxq.mr.wc.WordCountDriver
log4j:WARN No appenders Could be found for logger (org.apache.htrace.core.Tracer).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[WARN] [2021-06-01 20:09:54][org.apache.hadoop.metrics2.impl.MetricsConfig]Cannot locate configuration: tried hadoop-metrics2-jobtracker.properties,hadoop-metrics2.properties
[INFO] [2021-06-01 20:09:54][org.apache.hadoop.metrics2.impl.MetricsSystemImpl]Scheduled Metric snapshot period at 10 second(s).
[INFO] [2021-06-01 20:09:54][org.apache.hadoop.metrics2.impl.MetricsSystemImpl]JobTracker metrics system started
[WARN] [2021-06-01 20:09:55][org.apache.hadoop.mapreduce.JobResourceUploader]Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
[WARN] [2021-06-01 20:09:55][org.apache.hadoop.mapreduce.JobResourceUploader]No job jar file set. User classes may not be found. See Job or Job#setJar(String).
[INFO] [2021-06-01 20:09:55][org.apache.hadoop.mapreduce.lib.input.FileInputFormat]Total input files to process : 1
[INFO] [2021-06-01 20:09:55][org.apache.hadoop.mapreduce.JobSubmitter]number of splits:1
[INFO] [2021-06-01 20:09:55][org.apache.hadoop.mapreduce.JobSubmitter]Submitting tokens for job: job_local758397422_0001
[INFO] [2021-06-01 20:09:55][org.apache.hadoop.mapreduce.JobSubmitter]Executing with tokens: []
[INFO] [2021-06-01 20:09:55][org.apache.hadoop.mapreduce.Job]The url to track the job: http://localhost:8080/
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapreduce.Job]Running job: job_local758397422_0001
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.LocalJobRunner]OutputCommitter set in config null
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]File Output Committer Algorithm version is 2
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.LocalJobRunner]OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.LocalJobRunner]Waiting for map tasks
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.LocalJobRunner]Starting task: attempt_local758397422_0001_m_000000_0
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]File Output Committer Algorithm version is 2
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.Task] Using ResourceCalculatorProcesstree : org.apache.hadoop.yarn.util.WindowsBasedProcesstree@5792442a
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask]Processing split: file:/D:/hadoop_in/wcinput/hello.txt:0+106
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask](EQUATOR) 0 kvi 26214396(104857584)
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask]mapreduce.task.io.sort.mb: 100
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask]soft limit at 83886080
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask]bufstart = 0; bufvoid = 104857600
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask]kvstart = 26214396; length = 6553600
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask]Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.LocalJobRunner]
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask]Starting flush of map output
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask]Spilling map output
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask]bufstart = 0; bufend = 163; bufvoid = 104857600
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask]kvstart = 26214396(104857584); kvend = 26214332(104857328); length = 65/6553600
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.MapTask]Finished spill 0
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.Task]Task:attempt_local758397422_0001_m_000000_0 is done. And is in the process of committing
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.LocalJobRunner]map
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.Task]Task 'attempt_local758397422_0001_m_000000_0' done.
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.Task]Final Counters for attempt_local758397422_0001_m_000000_0: Counters: 17
File System Counters
FILE: Number of bytes read=261
FILE: Number of bytes written=341453
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=11
Map output records=17
Map output bytes=163
Map output materialized bytes=203
Input split bytes=101
Combine input records=0
Spilled Records=17
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
Total committed heap usage (bytes)=261619712
File Input Format Counters
Bytes Read=106
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.LocalJobRunner]Finishing task: attempt_local758397422_0001_m_000000_0
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.LocalJobRunner]map task executor complete.
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.LocalJobRunner]Waiting for reduce tasks
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapred.LocalJobRunner]Starting task: attempt_local758397422_0001_r_000000_0
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]File Output Committer Algorithm version is 2
[INFO] [2021-06-01 20:09:56][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.Job]Job job_local758397422_0001 running in uber mode : false
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.Job] map 100% reduce 0%
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.Task] Using ResourceCalculatorProcesstree : org.apache.hadoop.yarn.util.WindowsBasedProcesstree@1d0cf332
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.ReduceTask]Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@1346e016
[WARN] [2021-06-01 20:09:57][org.apache.hadoop.metrics2.impl.MetricsSystemImpl]JobTracker metrics system already initialized!
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]MergerManager: memoryLimit=2654155520, maxSingleShuffleLimit=663538880, mergeThreshold=1751742720, ioSortFactor=10, memToMemmergeOutputsThreshold=10
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.task.reduce.EventFetcher]attempt_local758397422_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.task.reduce.LocalFetcher]localfetcher#1 about to shuffle output of map attempt_local758397422_0001_m_000000_0 decomp: 199 len: 203 to MEMORY
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput]Read 199 bytes from map-output for attempt_local758397422_0001_m_000000_0
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]closeInMemoryFile -> map-output of size: 199, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->199
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.task.reduce.EventFetcher]EventFetcher is interrupted.. Returning
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.LocalJobRunner]1 / 1 copied.
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.Merger]Merging 1 sorted segments
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.Merger]Down to the last merge-pass, with 1 segments left of total size: 188 bytes
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]Merged 1 segments, 199 bytes to disk to satisfy reduce memory limit
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]Merging 1 files, 203 bytes from disk
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]Merging 0 segments, 0 bytes from memory into reduce
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.Merger]Merging 1 sorted segments
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.Merger]Down to the last merge-pass, with 1 segments left of total size: 188 bytes
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.LocalJobRunner]1 / 1 copied.
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.conf.Configuration.deprecation]mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.Task]Task:attempt_local758397422_0001_r_000000_0 is done. And is in the process of committing
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.LocalJobRunner]1 / 1 copied.
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.Task]Task attempt_local758397422_0001_r_000000_0 is allowed to commit Now
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]Saved output of task 'attempt_local758397422_0001_r_000000_0' to file:/D:/hadoop_out/wcinput_out
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.LocalJobRunner]reduce > reduce
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.Task]Task 'attempt_local758397422_0001_r_000000_0' done.
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.Task]Final Counters for attempt_local758397422_0001_r_000000_0: Counters: 24
File System Counters
FILE: Number of bytes read=699
FILE: Number of bytes written=341757
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Combine input records=0
Combine output records=0
Reduce input groups=11
Reduce shuffle bytes=203
Reduce input records=17
Reduce output records=11
Spilled Records=17
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
Total committed heap usage (bytes)=261619712
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Output Format Counters
Bytes Written=101
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.LocalJobRunner]Finishing task: attempt_local758397422_0001_r_000000_0
[INFO] [2021-06-01 20:09:57][org.apache.hadoop.mapred.LocalJobRunner]reduce task executor complete.
[INFO] [2021-06-01 20:09:58][org.apache.hadoop.mapreduce.Job] map 100% reduce 100%
[INFO] [2021-06-01 20:09:58][org.apache.hadoop.mapreduce.Job]Job job_local758397422_0001 completed successfully
[INFO] [2021-06-01 20:09:58][org.apache.hadoop.mapreduce.Job]Counters: 30
File System Counters
FILE: Number of bytes read=960
FILE: Number of bytes written=683210
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=11
Map output records=17
Map output bytes=163
Map output materialized bytes=203
Input split bytes=101
Combine input records=0
Combine output records=0
Reduce input groups=11
Reduce shuffle bytes=203
Reduce input records=17
Reduce output records=11
Spilled Records=34
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
Total committed heap usage (bytes)=523239424
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=106
File Output Format Counters
Bytes Written=101
Process finished with exit code 0
执行结果
banzhang 1
bigdata0111 1
cls 2
fengxq 2
hadoop 1
jiao 1
laiba 1
nihao 2
sgg 3
ss 2
xue 1
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。