我有一个csv文件,其中包含多个包含空字符串的列.在将csv读入pandas数据帧后,空字符串将转换为NaN.
现在我想将一个字符串标签附加到已经存在于列中的字符串,但仅添加到其中包含某些值的字符串而不是那些具有NaN的字符串
这就是我想要做的:
with open('file1.csv','r') as file:
for chunk in pd.read_csv(file,chunksize=1000, header=0, names=['A','B','C','D'])
if len(chunk) >=1:
if chunk['A'].notna:
chunk['A'] = "tag-"+chunk['A'].astype(str)
if chunk['B'].notna:
chunk['B'] = "tag-"+chunk['B'].astype(str)
if chunk['C'].notna:
chunk['C'] = "tag-"+chunk['C'].astype(str)
if chunk['D'].notna:
chunk['D'] = "tag-"+chunk['D'].astype(str)
这是我得到的错误:
AttributeError: 'Series' object has no attribute 'notna'
我想要的最终输出应该是这样的:
A,B,C,D
tag-a,tab-b,tag-c,
tag-a,tag-b,,
tag-a,,,
,,tag-c,
,,,tag-d
,tag-b,,tag-d
解决方法:
for chunk in pd.read_csv('file1.csv',chunksize=2, header=0, names=['A','B','C','D']):
if len(chunk) >=1:
m1 = chunk.notna()
chunk = chunk.mask(m1, "tag-" + chunk.astype(str))
您需要升级到最新版本的pandas,0.21.0.
你可以查看docs:
In order to promote more consistency among the pandas API, we have added additional top-level functions
isna()
andnotna()
that are aliases forisnull()
andnotnull()
. The naming scheme is Now more consistent with methods like.dropna()
and.fillna()
. Furthermore in all cases where .isnull() and .notnull() methods are defined, these have additional methods named.isna()
and.notna()
, these are included for classes Categorical, Index, Series, and DataFrame. (GH15001).The configuration option pd.options.mode.use_inf_as_null is deprecated, and pd.options.mode.use_inf_as_na is added as a replacement.
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。