对熊猫来说很新,所以对@R_502_6280@案的任何解释都表示赞赏.
Company Zip State City
1 *CBRE San Diego, CA 92101
4 1908 Brands Boulder, CO 80301
7 1st Infantry Division Headquarters Fort Riley, KS
10 21st Century Healthcare, Inc. Tempe 85282
15 AAA Jefferson City, MO 65101-9564
我想将我的数据中的Zip State city列拆分为3个不同的列.使用Pandas DataFrame, how do i split a column into two这篇文章的答案,如果我没有第一列,我就可以完成这项任务.编写正则表达式来捕获所有公司只会导致我捕获数据中的所有内容.
我也试过了
foo = lambda x: pandas.Series([i for i in reversed(x.split())])
data_pretty = data['Zip State City'].apply(foo)
但这导致我松开公司列并将多个单词的城市名称拆分为不同的列.
如何在保留公司列数据的同时拆分我的最后一列?
解决方法:
In [110]: df
Out[110]:
Company Zip State City
1 *CBRE San Diego, CA 92101
4 1908 Brands Boulder, CO 80301
7 1st Infantry Division Headquarters Fort Riley, KS
10 21st Century Healthcare, Inc. Tempe 85282
15 AAA Jefferson City, MO 65101-9564
In [112]: df[['City','State','ZIP']] = df['Zip State City'].str.extract(r'([^,\d]+)?[,]*\s*([A-Z]{2})?\s*([\d\-]{4,11})?', expand=True)
In [113]: df
Out[113]:
Company Zip State City City State ZIP
1 *CBRE San Diego, CA 92101 San Diego CA 92101
4 1908 Brands Boulder, CO 80301 Boulder CO 80301
7 1st Infantry Division Headquarters Fort Riley, KS Fort Riley KS NaN
10 21st Century Healthcare, Inc. Tempe 85282 Tempe NaN 85282
15 AAA Jefferson City, MO 65101-9564 Jefferson City MO 65101-9564
从docs开始:
Series.str.extract(pat, flags=0, expand=None)
For each subject string in the Series, extract groups from the first
match of regular expression pat.New in version 0.13.0.
Parameters:
pat : string
Regular expression pattern with capturing groups
flags : int, default 0 (no flags)
re module flags, e.g.
re.IGnorECASE .. versionadded:: 0.18.0expand : bool, default False
If True, return DataFrame.
If False, return Series/Index/DataFrame.
Returns: DataFrame with one row for each subject string, and one
column for each group. Any capture group names in regular expression
pat will be used for column names; otherwise capture group numbers
will be used. The dtype of each result column is always object, even
when no match is found. If expand=True and pat has only one capture
group, then return a Series (if subject is a Series) or Index (if
subject is an Index).
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。