微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

2.安装Spark与Python练习

 

 

 

with open("earth_song.txt", "r") as f:
text = f.read()
text = text.lower()
for ch in '!@#$%^&*(_)-+=\\[]}{|;:\'\"`~,<.>?/':
text = text.replace(ch, " ")

words = text.split() # 以空格分割文本
stop_words = []
with open('stop_words.txt', 'r') as f: # 读取停用词文件
for line in f:
stop_words.append(line.strip('\n'))
afterwords = []

for i in range(len(words)):
z = 1
for j in range(len(stop_words)):

if words[i] == stop_words[j]:
continue
else:
if z == len(stop_words):
afterwords.append(words[i])
break
z = z + 1
continue
counts = {}
for word in afterwords:
counts[word] = counts.get(word, 0) + 1
items = list(counts.items())
items.sort(key=lambda x: x[1], reverse=True)

f1 = open('count.txt', 'w')
for i in range(len(items)):
word, count = items[i]
f1.write(word+" "+str(count)+"\n")

 

 

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。

相关推荐