我已经查看了其他几个相关问题here,here和here,但没有一个问题和我有过相同的问题.
我正在使用Pandas版本0.16.2.我在Pandas数据帧中有几个列,dtype为datetime64 [ns]:
In [6]: date_list = ["SubmittedDate","PolicyStartDate", "PaidUpDate", "maturityDate", "DraftDate", "CurrentValuationDate", "dob", "InForceDate"]
In [11]: data[date_list].head()
Out[11]:
SubmittedDate PolicyStartDate PaidUpDate maturityDate DraftDate \
0 NaT 2002-11-18 NaT 2041-03-04 NaT
1 NaT 2015-01-13 NaT NaT NaT
2 NaT 2014-10-15 NaT NaT NaT
3 NaT 2009-08-27 NaT NaT NaT
4 NaT 2007-04-19 NaT 2013-10-01 NaT
CurrentValuationDate dob InForceDate
0 2015-04-30 1976-03-04 2002-11-18
1 NaT 1949-09-27 2015-01-13
2 NaT 1947-06-15 2014-10-15
3 2015-07-30 1960-06-07 2009-08-27
4 2010-04-21 1950-10-01 2007-04-19
这些最初是字符串格式(例如’1976-03-04′),我使用以下方法转换为datetime对象:
In [7]: for datecol in date_list:
...: data[datecol] = pd.to_datetime(data[datecol], coerce=True, errors = 'raise')
以下是每个列的dtypes:
In [8]: for datecol in date_list:
print data[datecol].dtypes
收益:
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
datetime64[ns]
到现在为止还挺好.但我想要做的是为每个列创建一个新列,从特定日期开始提供以天为单位的年龄(作为整数).
In [13]: current_date = pd.to_datetime("2015-07-31")
我第一次跑这个:
In [14]: for i in date_list:
....: data[i+"InDays"] = data[i].apply(lambda x: current_date - x)
但是,当我检查返回列的dtype时:
In [15]: for datecol in date_list:
....: print data[datecol + "InDays"].dtypes
我得到这些:
object
timedelta64[ns]
object
timedelta64[ns]
object
timedelta64[ns]
timedelta64[ns]
timedelta64[ns]
我不知道为什么他们三个是对象,什么时候应该是timedeltas.我接下来要做的是:
In [16]: for i in date_list:
....: data[i+"InDays"] = data[i+"InDays"].dt.days
这种方法适用于timedelta列.但是,由于其中三列不是timedeltas,我收到此错误:
AttributeError: Can only use .dt accessor with datetimelike values
我怀疑这三列中有一些值阻止Pandas将它们转换为timedeltas.我无法弄清楚如何弄清楚这些价值观可能是什么.
解决方法:
出现此问题是因为您有三个只有NaT值的列,这会导致在您对其应用条件时将这些列视为对象.
您应该在应用部分中添加某种条件,以便在NaT情况下默认为某个时间点.示例 –
for i in date_list:
data[i+"InDays"] = data[i].apply(lambda x: current_date - x if x is not pd.NaT else pd.timedelta(0))
或者,如果您不能执行上述操作,则应将条件设置为您想要执行的操作 – data [i“InDays”] = data [i“InDays”].dt.days,仅当系列的dtype允许时才接受它它.
或者更简单的方法来更改应用部分以直接获得您想要的 –
for i in date_list:
data[i+"InDays"] = data[i].apply(lambda x: (current_date - x).days if x is not pd.NaT else x)
这会输出 –
In [110]: data
Out[110]:
SubmittedDate PolicyStartDate PaidUpDate maturityDate DraftDate \
0 NaT 2002-11-18 NaT 2041-03-04 NaT
1 NaT 2015-01-13 NaT NaT NaT
2 NaT 2014-10-15 NaT NaT NaT
3 NaT 2009-08-27 NaT NaT NaT
4 NaT 2007-04-19 NaT 2013-10-01 NaT
CurrentValuationDate dob InForceDate SubmittedDateInDays \
0 2015-04-30 1976-03-04 2002-11-18 NaT
1 NaT 1949-09-27 2015-01-13 NaT
2 NaT 1947-06-15 2014-10-15 NaT
3 2015-07-30 1960-06-07 2009-08-27 NaT
4 2010-04-21 1950-10-01 2007-04-19 NaT
PolicyStartDateInDays PaidUpDateInDays maturityDateInDays DraftDateInDays \
0 4638 NaT -9348 NaT
1 199 NaT NaN NaT
2 289 NaT NaN NaT
3 2164 NaT NaN NaT
4 3025 NaT 668 NaT
CurrentValuationDateInDays dobInDays InForceDateInDays
0 92 14393 4638
1 NaN 24048 199
2 NaN 24883 289
3 1 20142 2164
4 1927 23679 3025
如果您想将NaT更改为NaN,您可以使用 –
for i in date_list:
data[i+"InDays"] = data[i].apply(lambda x: (current_date - x).days if x is not pd.NaT else np.NaN)
示例/演示 –
In [114]: for i in date_list:
.....: data[i+"InDays"] = data[i].apply(lambda x: (current_date - x).days if x is not pd.NaT else np.NaN)
.....:
In [115]: data
Out[115]:
SubmittedDate PolicyStartDate PaidUpDate maturityDate DraftDate \
0 NaT 2002-11-18 NaT 2041-03-04 NaT
1 NaT 2015-01-13 NaT NaT NaT
2 NaT 2014-10-15 NaT NaT NaT
3 NaT 2009-08-27 NaT NaT NaT
4 NaT 2007-04-19 NaT 2013-10-01 NaT
CurrentValuationDate dob InForceDate SubmittedDateInDays \
0 2015-04-30 1976-03-04 2002-11-18 NaN
1 NaT 1949-09-27 2015-01-13 NaN
2 NaT 1947-06-15 2014-10-15 NaN
3 2015-07-30 1960-06-07 2009-08-27 NaN
4 2010-04-21 1950-10-01 2007-04-19 NaN
PolicyStartDateInDays PaidUpDateInDays maturityDateInDays \
0 4638 NaN -9348
1 199 NaN NaN
2 289 NaN NaN
3 2164 NaN NaN
4 3025 NaN 668
DraftDateInDays CurrentValuationDateInDays dobInDays InForceDateInDays
0 NaN 92 14393 4638
1 NaN NaN 24048 199
2 NaN NaN 24883 289
3 NaN 1 20142 2164
4 NaN 1927 23679 3025
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。