微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

hadoop的merge操作脚本

import math
import struct
import traceback
 
import numpy as np
 
 
def mapper():
    #filepath = os.environ["map_input_file"]
    #filename = "zhangpeng66"
    filepath = 'jianku_data'
    for line in sys.stdin:
        if "jianku_data" in filepath:
            line = line.rstrip("\n")
            tokens = line.split('\t')
            if len(tokens) < 13: 
                continue
            os_key = tokens[0]
            title=tokens[5]
            real_title=tokens[10]
            alt=tokens[7]
            ct0=tokens[12]
            print '\t'.join([os_key, title, real_title, alt, ct0])
 
def reducer():
    for line in sys.stdin:
        line = line.strip('\r\n')
        l_info = line.split('\t')
       
        os_key = l_info[0]
 
        for os_query in open(sys.argv[2], 'r'):
            os_query = os_query.strip('\n\r')
            if os_key == os_query:
                print(line)
                break
 
if __name__ == '__main__':
    if sys.argv[1] == 'map':
        mapper()
    elif sys.argv[1] == 'reduce':
        reducer()
    else:
        print >> sys.stderr, 'map or reduce, please.'  

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。

相关推荐