在Python3和Pandas中,我有一个程序可以从列中创建文字云:
import pandas as pd
import numpy as np
from wordcloud import WordCloud
import matplotlib.pyplot as plt
autores_atuais = pd.read_csv("deputados_autores_projetos.csv", sep=',',encoding = 'utf-8', converters={'IdAutor': lambda x: str(x), 'IdDocumento': lambda x: str(x), 'Codoriginalidade': lambda x: str(x), 'IdNatureza': lambda x: str(x), 'NroLegislativo': lambda x: str(x)})
autores_atuais.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6632 entries, 74057 to 84859
Data columns (total 10 columns):
IdAutor 6632 non-null object
IdDocumento 6632 non-null object
NomeAutor 6632 non-null object
AnoLegislativo 6632 non-null object
Codoriginalidade 5295 non-null object
DtEnTradaSistema 6632 non-null object
DtPublicacao 6632 non-null object
Ementa 6632 non-null object
IdNatureza 6632 non-null object
NroLegislativo 6632 non-null object
dtypes: object(10)
memory usage: 569.9+ KB
wordcloud = WordCloud().generate(' '.join(autores_atuais['Ementa']))
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
请问,我怎么能忽略云中的一些词呢?例如,小词(“de”,“ao”)和某些词(“Estado”)
解决方法:
我认为需要boolean indexing
用〜用于反向条件用isin
用于过滤器列表的单词str.len
用于过滤单词的长度和必要的链条件由|:
autores_atuais = pd.DataFrame({'Ementa':['Estado','another','be','de','def','bax']})
print (autores_atuais)
Ementa
0 Estado
1 another
2 be
3 de
4 def
5 bax
m1 = autores_atuais['Ementa'].isin(['Estado','another','next'])
m2 = autores_atuais['Ementa'].str.len() < 3
s = autores_atuais.loc[~(m1 | m2), 'Ementa']
print (s)
4 def
5 bax
Name: Ementa, dtype: object
类似的选择与&对于AND和反向第一个条件由〜和第二个按> =:
m1 = ~autores_atuais['Ementa'].isin(['Estado','another','next'])
m2 = autores_atuais['Ementa'].str.len() >= 3
s = autores_atuais.loc[m1 & m2, 'Ementa']
print (s)
4 def
5 bax
Name: Ementa, dtype: object
wordcloud = WordCloud().generate(' '.join(s))
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。