1 官方GREP案例
By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.
The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.
#创建文件夹:存放源文件
mkdir input
#复制一部分文件为数据源
cp etc/hadoop/*.xml input
#执行hadoop-mapreduce-examples-2.7.2.jar grep来对数据源进行计算。统计dfs开头的单词
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'
#打印输出
cat output/*
问题1:如果输出目录已经存在,会抛出异常
2 官方WordCount案例
统计单词的个数。
mkdir wcinput
#创建文件
touch word.txt
#添加文件内容
hadoop yarn
hadoop mapreduce
atguigu
atguigu
3 执行wordCoount程序
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount wcinput/ wcoutput
4 查看结果
cat wcoutput/part-r-00000
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。