微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

对两个文件里的单词出现次数进行统计查询结果:a-p一个文件,q-z一个文件【本地模式】

需求

对两个文件里的单词出现次数进行统计查询结果:a-p一个文件,q-z一个文件

输入文件1(word01.txt)

Hadoop
Spark Hive
Hbase
Hadoop
Spark

输入文件2(word02.txt)

Java PHP
Android
Html5
Bigdata
python

计算结果:

part-r-00000

Android 1
Bigdata 1
Hadoop 2
Hbase 1
Hive 1
Html5 1
PHP 1
python 1

part-r-00001

Java 1
Spark 2

编程思路

1)需要4个类,分别为WordCountMapper,WordCountPartitioner,WordCountReducer,WordCountDriver
2)WordCountMapper 继承Hadoop的Mapper类,并重写map方法
3)查询结果:a-p一个文件,q-z一个文件,所以需要进行分2个区,也就是需要2个ReduceTask;WordCountPartitioner 继承Hadoop的Partitioner,并跟进需求进行分区;
4)WordCountReducer 继承Hadoop的Reducer,并重写reduce方法
5)WordCountDriver是程序的入口,用来定义Job和配置所需参数

代码

WordCountMapper类

package com.fengxq.mr.partition2;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

/**
 *
 * LongWritable, Text, // 输入
 * Text, IntWritable // 输出
 */
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
    Text kout = new Text();
    IntWritable vout = new IntWritable(1);
    /**
     * 重写map方法
     * @param key
     * @param value
     * @param context
     * @throws IOException
     * @throws InterruptedException
     */
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        // 获取行数据
        String line = value.toString();
        // 已空格进行分割
        String[] datas = line.split(" ");
        // 循环字符串数组数据,并输出
        for (String data : datas) {
            kout.set(data); // 设置key值
            context.write(kout, vout); // 输出到环形缓冲区
        }
    }
}

WordCountPartitioner自定义

package com.fengxq.mr.partition2;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;

public class WordCountPartitioner extends Partitioner<Text, IntWritable> {
    @Override
    public int getPartition(Text text, IntWritable intWritable, int numPartitions) {
        int partitionNum;
        String a2p = "abcdefghigklmnop";
        String key = text.toString();
        // 获取key的首字母,忽略大小写
        String fist = key.toLowerCase().substring(0,1);
        if(a2p.contains(fist)){
            partitionNum = 0;
        }else{
            partitionNum = 1;
        }
        return partitionNum;
    }
}

WordCountReducer

package com.fengxq.mr.partition2;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

/**
 * 自定义WordCountReducer类,继承hadoop的Reducer类
 *
 * Text, IntWritable, // 输入参数
 * Text, IntWritable // 输出参数
 */
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
    IntWritable vout = new IntWritable();
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int sum=0; // 定义局部变量sum
        // 根据相同的key进行分组的数据进行统计
        for (IntWritable value : values) {
            sum += value.get();
        }
        vout.set(sum); // 设置输出统计
        context.write(key, vout); // 输出到磁盘
    }
}

WordCountDriver

package com.fengxq.mr.partition2;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

/**
 * Driver
 */
public class WordCountDriver {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf); // 获取job实例
        job.setJarByClass(WordCountDriver.class); // 设置jar,作为一个MR供hadoop执行

        // 设置Mapper参数
        job.setMapperClass(WordCountMapper.class);
        job.setoutputKeyClass(Text.class);
        job.setoutputValueClass(IntWritable.class);

        // 设置Reducer参数
        job.setReducerClass(WordCountReducer.class);
        job.setoutputKeyClass(Text.class);
        job.setoutputValueClass(IntWritable.class);

        // 设置分区类
        job.setPartitionerClass(WordCountPartitioner.class);
        // 设置ReduceTask个数为2,分区也为2个
        job.setNumReduceTasks(2);

        // 设置输入路径
        FileInputFormat.setInputPaths(job, new Path("D:\\hadoop_in\\partition2"));
        FileOutputFormat.setoutputPath(job, new Path("D:\\hadoop_out\\partition2"));

        // 提交job
        job.waitForCompletion(true);
    }
}

控制台输出

C:\instal\java\jdk8\bin\java.exe "-javaagent:C:\instal\JetBrains\IntelliJ IDEA 2019.3.1\lib\idea_rt.jar=2282:C:\instal\JetBrains\IntelliJ IDEA 2019.3.1\bin" -Dfile.encoding=UTF-8 -classpath C:\instal\java\jdk8\jre\lib\charsets.jar;C:\instal\java\jdk8\jre\lib\deploy.jar;C:\instal\java\jdk8\jre\lib\ext\access-bridge-64.jar;C:\instal\java\jdk8\jre\lib\ext\cldrdata.jar;C:\instal\java\jdk8\jre\lib\ext\dnsns.jar;C:\instal\java\jdk8\jre\lib\ext\jaccess.jar;C:\instal\java\jdk8\jre\lib\ext\jfxrt.jar;C:\instal\java\jdk8\jre\lib\ext\localedata.jar;C:\instal\java\jdk8\jre\lib\ext\nashorn.jar;C:\instal\java\jdk8\jre\lib\ext\sunec.jar;C:\instal\java\jdk8\jre\lib\ext\sunjce_provider.jar;C:\instal\java\jdk8\jre\lib\ext\sunmscapi.jar;C:\instal\java\jdk8\jre\lib\ext\sunpkcs11.jar;C:\instal\java\jdk8\jre\lib\ext\zipfs.jar;C:\instal\java\jdk8\jre\lib\javaws.jar;C:\instal\java\jdk8\jre\lib\jce.jar;C:\instal\java\jdk8\jre\lib\jfr.jar;C:\instal\java\jdk8\jre\lib\jfxswt.jar;C:\instal\java\jdk8\jre\lib\jsse.jar;C:\instal\java\jdk8\jre\lib\management-agent.jar;C:\instal\java\jdk8\jre\lib\plugin.jar;C:\instal\java\jdk8\jre\lib\resources.jar;C:\instal\java\jdk8\jre\lib\rt.jar;C:\mydisk\ws_bigdata2\MapReduce\target\classes;C:\mydisk\pub_noinstal\RepMaven\junit\junit\4.12\junit-4.12.jar;C:\mydisk\pub_noinstal\RepMaven\org\hamcrest\hamcrest-core\1.3\hamcrest-core-1.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\logging\log4j\log4j-slf4j-impl\2.12.0\log4j-slf4j-impl-2.12.0.jar;C:\mydisk\pub_noinstal\RepMaven\org\slf4j\slf4j-api\1.7.25\slf4j-api-1.7.25.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\logging\log4j\log4j-api\2.12.0\log4j-api-2.12.0.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\logging\log4j\log4j-core\2.12.0\log4j-core-2.12.0.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-client\3.1.3\hadoop-client-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-common\3.1.3\hadoop-common-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\guava\guava\27.0-jre\guava-27.0-jre.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\guava\failureaccess\1.0\failureaccess-1.0.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\guava\listenablefuture\9999.0-empty-to-avoid-conflict-with-guava\listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar;C:\mydisk\pub_noinstal\RepMaven\org\checkerframework\checker-qual\2.5.2\checker-qual-2.5.2.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\errorprone\error_prone_annotations\2.2.0\error_prone_annotations-2.2.0.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\j2objc\j2objc-annotations\1.1\j2objc-annotations-1.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\codehaus\mojo\animal-sniffer-annotations\1.17\animal-sniffer-annotations-1.17.jar;C:\mydisk\pub_noinstal\RepMaven\commons-cli\commons-cli\1.2\commons-cli-1.2.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\commons\commons-math3\3.1.1\commons-math3-3.1.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\httpcomponents\httpclient\4.5.2\httpclient-4.5.2.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\httpcomponents\httpcore\4.4.4\httpcore-4.4.4.jar;C:\mydisk\pub_noinstal\RepMaven\commons-codec\commons-codec\1.11\commons-codec-1.11.jar;C:\mydisk\pub_noinstal\RepMaven\commons-io\commons-io\2.5\commons-io-2.5.jar;C:\mydisk\pub_noinstal\RepMaven\commons-net\commons-net\3.6\commons-net-3.6.jar;C:\mydisk\pub_noinstal\RepMaven\commons-collections\commons-collections\3.2.2\commons-collections-3.2.2.jar;C:\mydisk\pub_noinstal\RepMaven\org\eclipse\jetty\jetty-servlet\9.3.24.v20180605\jetty-servlet-9.3.24.v20180605.jar;C:\mydisk\pub_noinstal\RepMaven\org\eclipse\jetty\jetty-security\9.3.24.v20180605\jetty-security-9.3.24.v20180605.jar;C:\mydisk\pub_noinstal\RepMaven\org\eclipse\jetty\jetty-webapp\9.3.24.v20180605\jetty-webapp-9.3.24.v20180605.jar;C:\mydisk\pub_noinstal\RepMaven\org\eclipse\jetty\jetty-xml\9.3.24.v20180605\jetty-xml-9.3.24.v20180605.jar;C:\mydisk\pub_noinstal\RepMaven\javax\servlet\jsp\jsp-api\2.1\jsp-api-2.1.jar;C:\mydisk\pub_noinstal\RepMaven\com\sun\jersey\jersey-servlet\1.19\jersey-servlet-1.19.jar;C:\mydisk\pub_noinstal\RepMaven\commons-logging\commons-logging\1.1.3\commons-logging-1.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\log4j\log4j\1.2.17\log4j-1.2.17.jar;C:\mydisk\pub_noinstal\RepMaven\commons-lang\commons-lang\2.6\commons-lang-2.6.jar;C:\mydisk\pub_noinstal\RepMaven\commons-beanutils\commons-beanutils\1.9.3\commons-beanutils-1.9.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\commons\commons-configuration2\2.1.1\commons-configuration2-2.1.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\commons\commons-lang3\3.4\commons-lang3-3.4.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\avro\avro\1.7.7\avro-1.7.7.jar;C:\mydisk\pub_noinstal\RepMaven\org\codehaus\jackson\jackson-core-asl\1.9.13\jackson-core-asl-1.9.13.jar;C:\mydisk\pub_noinstal\RepMaven\org\codehaus\jackson\jackson-mapper-asl\1.9.13\jackson-mapper-asl-1.9.13.jar;C:\mydisk\pub_noinstal\RepMaven\com\thoughtworks\paranamer\paranamer\2.3\paranamer-2.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\xerial\snappy\snappy-java\1.0.5\snappy-java-1.0.5.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\re2j\re2j\1.1\re2j-1.1.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\protobuf\protobuf-java\2.5.0\protobuf-java-2.5.0.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\code\gson\gson\2.2.4\gson-2.2.4.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-auth\3.1.3\hadoop-auth-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\com\nimbusds\nimbus-jose-jwt\4.41.1\nimbus-jose-jwt-4.41.1.jar;C:\mydisk\pub_noinstal\RepMaven\com\github\stephenc\jcip\jcip-annotations\1.0-1\jcip-annotations-1.0-1.jar;C:\mydisk\pub_noinstal\RepMaven\net\minidev\json-smart\2.3\json-smart-2.3.jar;C:\mydisk\pub_noinstal\RepMaven\net\minidev\accessors-smart\1.2\accessors-smart-1.2.jar;C:\mydisk\pub_noinstal\RepMaven\org\ow2\asm\asm\5.0.4\asm-5.0.4.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\curator\curator-framework\2.13.0\curator-framework-2.13.0.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\curator\curator-client\2.13.0\curator-client-2.13.0.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\curator\curator-recipes\2.13.0\curator-recipes-2.13.0.jar;C:\mydisk\pub_noinstal\RepMaven\com\google\code\findbugs\jsr305\3.0.0\jsr305-3.0.0.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\htrace\htrace-core4\4.1.0-incubating\htrace-core4-4.1.0-incubating.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\commons\commons-compress\1.18\commons-compress-1.18.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-simplekdc\1.0.1\kerb-simplekdc-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-client\1.0.1\kerb-client-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerby-config\1.0.1\kerby-config-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-core\1.0.1\kerb-core-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerby-pkix\1.0.1\kerby-pkix-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerby-asn1\1.0.1\kerby-asn1-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerby-util\1.0.1\kerby-util-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-common\1.0.1\kerb-common-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-crypto\1.0.1\kerb-crypto-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-util\1.0.1\kerb-util-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\token-provider\1.0.1\token-provider-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-admin\1.0.1\kerb-admin-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-server\1.0.1\kerb-server-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerb-identity\1.0.1\kerb-identity-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\kerby\kerby-xdr\1.0.1\kerby-xdr-1.0.1.jar;C:\mydisk\pub_noinstal\RepMaven\com\fasterxml\jackson\core\jackson-databind\2.7.8\jackson-databind-2.7.8.jar;C:\mydisk\pub_noinstal\RepMaven\com\fasterxml\jackson\core\jackson-core\2.7.8\jackson-core-2.7.8.jar;C:\mydisk\pub_noinstal\RepMaven\org\codehaus\woodstox\stax2-api\3.1.4\stax2-api-3.1.4.jar;C:\mydisk\pub_noinstal\RepMaven\com\fasterxml\woodstox\woodstox-core\5.0.3\woodstox-core-5.0.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-hdfs-client\3.1.3\hadoop-hdfs-client-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\com\squareup\okhttp\okhttp\2.7.5\okhttp-2.7.5.jar;C:\mydisk\pub_noinstal\RepMaven\com\squareup\okio\okio\1.6.0\okio-1.6.0.jar;C:\mydisk\pub_noinstal\RepMaven\com\fasterxml\jackson\core\jackson-annotations\2.7.8\jackson-annotations-2.7.8.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-yarn-api\3.1.3\hadoop-yarn-api-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\javax\xml\bind\jaxb-api\2.2.11\jaxb-api-2.2.11.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-yarn-client\3.1.3\hadoop-yarn-client-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-mapreduce-client-core\3.1.3\hadoop-mapreduce-client-core-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-yarn-common\3.1.3\hadoop-yarn-common-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\javax\servlet\javax.servlet-api\3.1.0\javax.servlet-api-3.1.0.jar;C:\mydisk\pub_noinstal\RepMaven\org\eclipse\jetty\jetty-util\9.3.24.v20180605\jetty-util-9.3.24.v20180605.jar;C:\mydisk\pub_noinstal\RepMaven\com\sun\jersey\jersey-core\1.19\jersey-core-1.19.jar;C:\mydisk\pub_noinstal\RepMaven\javax\ws\rs\jsr311-api\1.1.1\jsr311-api-1.1.1.jar;C:\mydisk\pub_noinstal\RepMaven\com\sun\jersey\jersey-client\1.19\jersey-client-1.19.jar;C:\mydisk\pub_noinstal\RepMaven\com\fasterxml\jackson\module\jackson-module-jaxb-annotations\2.7.8\jackson-module-jaxb-annotations-2.7.8.jar;C:\mydisk\pub_noinstal\RepMaven\com\fasterxml\jackson\jaxrs\jackson-jaxrs-json-provider\2.7.8\jackson-jaxrs-json-provider-2.7.8.jar;C:\mydisk\pub_noinstal\RepMaven\com\fasterxml\jackson\jaxrs\jackson-jaxrs-base\2.7.8\jackson-jaxrs-base-2.7.8.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-mapreduce-client-jobclient\3.1.3\hadoop-mapreduce-client-jobclient-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-mapreduce-client-common\3.1.3\hadoop-mapreduce-client-common-3.1.3.jar;C:\mydisk\pub_noinstal\RepMaven\org\apache\hadoop\hadoop-annotations\3.1.3\hadoop-annotations-3.1.3.jar com.fengxq.mr.partition2.WordCountDriver
log4j:WARN No appenders Could be found for logger (org.apache.htrace.core.Tracer).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[WARN] [2021-05-30 14:27:21][org.apache.hadoop.metrics2.impl.MetricsConfig]Cannot locate configuration: tried hadoop-metrics2-jobtracker.properties,hadoop-metrics2.properties
[INFO] [2021-05-30 14:27:21][org.apache.hadoop.metrics2.impl.MetricsSystemImpl]Scheduled Metric snapshot period at 10 second(s).
[INFO] [2021-05-30 14:27:21][org.apache.hadoop.metrics2.impl.MetricsSystemImpl]JobTracker metrics system started
[WARN] [2021-05-30 14:27:21][org.apache.hadoop.mapreduce.JobResourceUploader]Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
[WARN] [2021-05-30 14:27:21][org.apache.hadoop.mapreduce.JobResourceUploader]No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
[INFO] [2021-05-30 14:27:21][org.apache.hadoop.mapreduce.lib.input.FileInputFormat]Total input files to process : 2
[INFO] [2021-05-30 14:27:21][org.apache.hadoop.mapreduce.JobSubmitter]number of splits:2
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapreduce.JobSubmitter]Submitting tokens for job: job_local25438264_0001
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapreduce.JobSubmitter]Executing with tokens: []
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapreduce.Job]The url to track the job: http://localhost:8080/
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapreduce.Job]Running job: job_local25438264_0001
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.LocalJobRunner]OutputCommitter set in config null
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]File Output Committer Algorithm version is 2
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.LocalJobRunner]OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.LocalJobRunner]Waiting for map tasks
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.LocalJobRunner]Starting task: attempt_local25438264_0001_m_000000_0
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]File Output Committer Algorithm version is 2
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.Task] Using ResourceCalculatorProcesstree : org.apache.hadoop.yarn.util.WindowsBasedProcesstree@7a5281b4
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.MapTask]Processing split: file:/D:/hadoop_in/partition2/word02.txt:0+41
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.MapTask](EQUATOR) 0 kvi 26214396(104857584)
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.MapTask]mapreduce.task.io.sort.mb: 100
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.MapTask]soft limit at 83886080
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.MapTask]bufstart = 0; bufvoid = 104857600
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.MapTask]kvstart = 26214396; length = 6553600
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.MapTask]Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.LocalJobRunner]
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.MapTask]Starting flush of map output
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.MapTask]Spilling map output
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.MapTask]bufstart = 0; bufend = 62; bufvoid = 104857600
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.MapTask]kvstart = 26214396(104857584); kvend = 26214376(104857504); length = 21/6553600
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.MapTask]Finished spill 0
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.Task]Task:attempt_local25438264_0001_m_000000_0 is done. And is in the process of committing
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.LocalJobRunner]map
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.Task]Task 'attempt_local25438264_0001_m_000000_0' done.
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.Task]Final Counters for attempt_local25438264_0001_m_000000_0: Counters: 17
	File System Counters
		FILE: Number of bytes read=318
		FILE: Number of bytes written=339491
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
	Map-Reduce Framework
		Map input records=5
		Map output records=6
		Map output bytes=62
		Map output materialized bytes=86
		Input split bytes=105
		Combine input records=0
		Spilled Records=6
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=0
		Total committed heap usage (bytes)=198180864
	File Input Format Counters 
		Bytes Read=41
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.LocalJobRunner]Finishing task: attempt_local25438264_0001_m_000000_0
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapred.LocalJobRunner]Starting task: attempt_local25438264_0001_m_000001_0
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]File Output Committer Algorithm version is 2
[INFO] [2021-05-30 14:27:22][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapreduce.Job]Job job_local25438264_0001 running in uber mode : false
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapreduce.Job] map 100% reduce 0%
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.Task] Using ResourceCalculatorProcesstree : org.apache.hadoop.yarn.util.WindowsBasedProcesstree@16b58901
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.MapTask]Processing split: file:/D:/hadoop_in/partition2/word01.txt:0+40
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.MapTask](EQUATOR) 0 kvi 26214396(104857584)
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.MapTask]mapreduce.task.io.sort.mb: 100
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.MapTask]soft limit at 83886080
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.MapTask]bufstart = 0; bufvoid = 104857600
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.MapTask]kvstart = 26214396; length = 6553600
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.MapTask]Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.LocalJobRunner]
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.MapTask]Starting flush of map output
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.MapTask]Spilling map output
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.MapTask]bufstart = 0; bufend = 61; bufvoid = 104857600
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.MapTask]kvstart = 26214396(104857584); kvend = 26214376(104857504); length = 21/6553600
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.MapTask]Finished spill 0
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.Task]Task:attempt_local25438264_0001_m_000001_0 is done. And is in the process of committing
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.LocalJobRunner]map
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.Task]Task 'attempt_local25438264_0001_m_000001_0' done.
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.Task]Final Counters for attempt_local25438264_0001_m_000001_0: Counters: 17
	File System Counters
		FILE: Number of bytes read=587
		FILE: Number of bytes written=339632
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
	Map-Reduce Framework
		Map input records=5
		Map output records=6
		Map output bytes=61
		Map output materialized bytes=85
		Input split bytes=105
		Combine input records=0
		Spilled Records=6
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=0
		Total committed heap usage (bytes)=303562752
	File Input Format Counters 
		Bytes Read=40
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.LocalJobRunner]Finishing task: attempt_local25438264_0001_m_000001_0
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.LocalJobRunner]map task executor complete.
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.LocalJobRunner]Waiting for reduce tasks
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.LocalJobRunner]Starting task: attempt_local25438264_0001_r_000000_0
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]File Output Committer Algorithm version is 2
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.Task] Using ResourceCalculatorProcesstree : org.apache.hadoop.yarn.util.WindowsBasedProcesstree@7e17ebdf
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.ReduceTask]Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@1a24cabd
[WARN] [2021-05-30 14:27:23][org.apache.hadoop.metrics2.impl.MetricsSystemImpl]JobTracker metrics system already initialized!
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]MergerManager: memoryLimit=2654155520, maxSingleShuffleLimit=663538880, mergeThreshold=1751742720, ioSortFactor=10, memToMemmergeOutputsThreshold=10
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapreduce.task.reduce.EventFetcher]attempt_local25438264_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapreduce.task.reduce.LocalFetcher]localfetcher#1 about to shuffle output of map attempt_local25438264_0001_m_000001_0 decomp: 51 len: 55 to MEMORY
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput]Read 51 bytes from map-output for attempt_local25438264_0001_m_000001_0
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]closeInMemoryFile -> map-output of size: 51, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->51
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapreduce.task.reduce.LocalFetcher]localfetcher#1 about to shuffle output of map attempt_local25438264_0001_m_000000_0 decomp: 65 len: 69 to MEMORY
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput]Read 65 bytes from map-output for attempt_local25438264_0001_m_000000_0
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]closeInMemoryFile -> map-output of size: 65, inMemoryMapOutputs.size() -> 2, commitMemory -> 51, usedMemory ->116
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapreduce.task.reduce.EventFetcher]EventFetcher is interrupted.. Returning
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.LocalJobRunner]2 / 2 copied.
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]finalMerge called with 2 in-memory map-outputs and 0 on-disk map-outputs
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.Merger]Merging 2 sorted segments
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.Merger]Down to the last merge-pass, with 2 segments left of total size: 97 bytes
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]Merged 2 segments, 116 bytes to disk to satisfy reduce memory limit
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]Merging 1 files, 118 bytes from disk
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]Merging 0 segments, 0 bytes from memory into reduce
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.Merger]Merging 1 sorted segments
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.Merger]Down to the last merge-pass, with 1 segments left of total size: 104 bytes
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.LocalJobRunner]2 / 2 copied.
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.conf.Configuration.deprecation]mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.Task]Task:attempt_local25438264_0001_r_000000_0 is done. And is in the process of committing
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.LocalJobRunner]2 / 2 copied.
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.Task]Task attempt_local25438264_0001_r_000000_0 is allowed to commit Now
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]Saved output of task 'attempt_local25438264_0001_r_000000_0' to file:/D:/hadoop_out/partition2
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.LocalJobRunner]reduce > reduce
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.Task]Task 'attempt_local25438264_0001_r_000000_0' done.
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.Task]Final Counters for attempt_local25438264_0001_r_000000_0: Counters: 24
	File System Counters
		FILE: Number of bytes read=988
		FILE: Number of bytes written=339829
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
	Map-Reduce Framework
		Combine input records=0
		Combine output records=0
		Reduce input groups=8
		Reduce shuffle bytes=124
		Reduce input records=9
		Reduce output records=8
		Spilled Records=9
		Shuffled Maps =2
		Failed Shuffles=0
		Merged Map outputs=2
		GC time elapsed (ms)=0
		Total committed heap usage (bytes)=303562752
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Output Format Counters 
		Bytes Written=79
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.LocalJobRunner]Finishing task: attempt_local25438264_0001_r_000000_0
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapred.LocalJobRunner]Starting task: attempt_local25438264_0001_r_000001_0
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]File Output Committer Algorithm version is 2
[INFO] [2021-05-30 14:27:23][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapreduce.Job] map 100% reduce 100%
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapred.Task] Using ResourceCalculatorProcesstree : org.apache.hadoop.yarn.util.WindowsBasedProcesstree@19ca8368
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapred.ReduceTask]Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@1ee7fb80
[WARN] [2021-05-30 14:27:24][org.apache.hadoop.metrics2.impl.MetricsSystemImpl]JobTracker metrics system already initialized!
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]MergerManager: memoryLimit=2654155520, maxSingleShuffleLimit=663538880, mergeThreshold=1751742720, ioSortFactor=10, memToMemmergeOutputsThreshold=10
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapreduce.task.reduce.EventFetcher]attempt_local25438264_0001_r_000001_0 Thread started: EventFetcher for fetching Map Completion Events
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapreduce.task.reduce.LocalFetcher]localfetcher#2 about to shuffle output of map attempt_local25438264_0001_m_000001_0 decomp: 26 len: 30 to MEMORY
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput]Read 26 bytes from map-output for attempt_local25438264_0001_m_000001_0
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]closeInMemoryFile -> map-output of size: 26, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->26
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapreduce.task.reduce.LocalFetcher]localfetcher#2 about to shuffle output of map attempt_local25438264_0001_m_000000_0 decomp: 13 len: 17 to MEMORY
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput]Read 13 bytes from map-output for attempt_local25438264_0001_m_000000_0
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]closeInMemoryFile -> map-output of size: 13, inMemoryMapOutputs.size() -> 2, commitMemory -> 26, usedMemory ->39
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapreduce.task.reduce.EventFetcher]EventFetcher is interrupted.. Returning
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapred.LocalJobRunner]2 / 2 copied.
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]finalMerge called with 2 in-memory map-outputs and 0 on-disk map-outputs
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapred.Merger]Merging 2 sorted segments
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapred.Merger]Down to the last merge-pass, with 2 segments left of total size: 24 bytes
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]Merged 2 segments, 39 bytes to disk to satisfy reduce memory limit
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]Merging 1 files, 41 bytes from disk
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl]Merging 0 segments, 0 bytes from memory into reduce
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapred.Merger]Merging 1 sorted segments
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapred.Merger]Down to the last merge-pass, with 1 segments left of total size: 30 bytes
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapred.LocalJobRunner]2 / 2 copied.
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapred.Task]Task:attempt_local25438264_0001_r_000001_0 is done. And is in the process of committing
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapred.LocalJobRunner]2 / 2 copied.
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapred.Task]Task attempt_local25438264_0001_r_000001_0 is allowed to commit Now
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]Saved output of task 'attempt_local25438264_0001_r_000001_0' to file:/D:/hadoop_out/partition2
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapred.LocalJobRunner]reduce > reduce
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapred.Task]Task 'attempt_local25438264_0001_r_000001_0' done.
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapred.Task]Final Counters for attempt_local25438264_0001_r_000001_0: Counters: 24
	File System Counters
		FILE: Number of bytes read=1188
		FILE: Number of bytes written=339897
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
	Map-Reduce Framework
		Combine input records=0
		Combine output records=0
		Reduce input groups=2
		Reduce shuffle bytes=47
		Reduce input records=3
		Reduce output records=2
		Spilled Records=3
		Shuffled Maps =2
		Failed Shuffles=0
		Merged Map outputs=2
		GC time elapsed (ms)=0
		Total committed heap usage (bytes)=303562752
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Output Format Counters 
		Bytes Written=27
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapred.LocalJobRunner]Finishing task: attempt_local25438264_0001_r_000001_0
[INFO] [2021-05-30 14:27:24][org.apache.hadoop.mapred.LocalJobRunner]reduce task executor complete.
[INFO] [2021-05-30 14:27:25][org.apache.hadoop.mapreduce.Job]Job job_local25438264_0001 completed successfully
[INFO] [2021-05-30 14:27:25][org.apache.hadoop.mapreduce.Job]Counters: 30
	File System Counters
		FILE: Number of bytes read=3081
		FILE: Number of bytes written=1358849
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
	Map-Reduce Framework
		Map input records=10
		Map output records=12
		Map output bytes=123
		Map output materialized bytes=171
		Input split bytes=210
		Combine input records=0
		Combine output records=0
		Reduce input groups=10
		Reduce shuffle bytes=171
		Reduce input records=12
		Reduce output records=10
		Spilled Records=24
		Shuffled Maps =4
		Failed Shuffles=0
		Merged Map outputs=4
		GC time elapsed (ms)=0
		Total committed heap usage (bytes)=1108869120
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=81
	File Output Format Counters 
		Bytes Written=106

Process finished with exit code 0

最终输出文件

在这里插入图片描述

输出文件1

part-r-00000

Android	1
Bigdata	1
Hadoop	2
Hbase	1
Hive	1
Html5	1
PHP	1
python	1

part-r-00001

Java	1
Spark	2

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。

相关推荐