我正在尝试使用pandas.read_fwf读取固定宽度的文件,请查看该文件的示例如下:
0000123456700123
0001234567800045
比如说,第0-11列是余额(格式为?.2f),第11-16列是利率(格式为%6.2f).所以我期望的输出数据框应如下所示:
Balance Int_Rate
0 12345.67 1.23
1 123456.78 0.45
colspecs = [(0,11),(11,16)]
header = ['Balance','Int_Rate']
df = pd.read_fwf("dataset",colspecs=colspecs, names=header)
我已经检查了pandas.read_fwf的文档,但是在导入过程中似乎无法将列格式化为选项.我之后是否必须更新格式,或者有更好的方法吗?
解决方法:
我有一段时间遇到同样的问题,我使用struct然后pandas
import struct
import pandas as pd
def parse_data_file(fieldwidths, fn):
#
# see https://docs.python.org/3.0/library/struct.html, for formatting and other info
fmtstring = ' '.join('{}{}'.format(abs(fw), 'x' if fw < 0 else 's')
for fw in fieldwidths)
fieldstruct = struct.Struct(fmtstring)
umpack = fieldstruct.unpack_from
# this part will dissect your data, per your fieldwiths
parse = lambda line: tuple(s.decode() for s in umpack(line.encode()))
df = []
with open(fn, 'r') as f:
for line in f:
row = parse(line)
df.append(row)
return df
#
# test.txt file content, per below
# 6332 x102340 Darwin 080007Darwin 1101
# 6332 x102342 Sydney 200001Sydney 1101
file_location = "test.txt"
fieldwidths = (10 ,10 ,100 ,4 ,2 ,50 ,4) # negative widths represent ignored padding fields
column_names = ['ID', 'LocationID', 'LocationName', 'PostCode', 'StateID', 'Address', 'CountryID']
fields = parse_data_file(fieldwidths=fieldwidths, fn=file_location)
# Pandas options
pd.options.display.width=500
pd.options.display.colheader_justify='left'
# assigned list into dataframe
df = pd.DataFrame(fields)
df.columns = column_names
print(df)
产量
ID LocationID LocationName PostCode StateID Address CountryID 6332 x102340 Darwin 0800 07 Darwin 1101 6332 x102342 Sydney 2000 01 Sydney 1101
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。