微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Hadoop映射减少整个文件的输入格式

我正在尝试使用hadoop mapreduce,但是我不想一次在我的Mapper中映射每一行,而是想一次映射整个文件.

所以我找到了这两个类
(https://code.google.com/p/hadoop-course/source/browse/HadoopSamples/src/main/java/mr/wholeFile/?r=3)
可以帮助我做到这一点.

而且我收到一个编译错误,内容为:

The method setInputFormat(Class) in the type
JobConf is not applicable for the arguments
(Class) Driver.java /ex2/src line 33 Java
Problem

我将驱动程序类更改为

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.InputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.textoutputFormat;

import forma.WholeFileInputFormat;

/*
 * Driver
 * The Driver class is responsible of creating the job and commiting it.
 */
public class Driver {
    public static void main(String[] args) throws Exception {
        JobConf conf = new JobConf(Driver.class);
        conf.setJobName("Get minimun for each month");

        conf.setoutputKeyClass(IntWritable.class);
        conf.setoutputValueClass(IntWritable.class);

        conf.setMapperClass(Map.class);
        conf.setCombinerClass(Reduce.class);
        conf.setReducerClass(Reduce.class);

        // prevIoUs it was 
        // conf.setInputFormat(TextInputFormat.class);
        // And it was changed it to :
        conf.setInputFormat(WholeFileInputFormat.class);

        conf.setoutputFormat(textoutputFormat.class);

        FileInputFormat.setInputPaths(conf,new Path("input"));
        FileOutputFormat.setoutputPath(conf,new Path("output"));

        System.out.println("Starting Job...");
        JobClient.runJob(conf);
        System.out.println("Job Done!");
    }

}

我究竟做错了什么?

解决方法:

确保您的WholeFileInputFormat类具有正确的导入.您正在作业驱动程序中使用旧的MapReduce Api.我认为您在WholeFileInputFormat类中导入了新的API FileInputFormat.如果我是对的,则应在您的WholeFileInputFormat类中导入org.apache.hadoop.mapred.FileInputFormat而不是org.apache.hadoop.mapreduce.lib.input.FileInputFormat.

希望这可以帮助.

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。

相关推荐