微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何在数据框中透视包含字符串的一列?

参见英文答案 > How to pivot a dataframe                                    1个
我试图通过将数据中的一个列转换为行(通过旋转或取消堆叠)来重塑pandas数据帧.

我是新手,很可能我错过了一些明显的东西.我进行了广泛的搜索,但未能成功应用我遇到的任何解决方案.

df
    Location    Month       Metric       Value
0   Texas       January     Temperature  10
1   New York    January     Temperature  20
2   California  January     Temperature  30
3   Alaska      January     Temperature  40
4   Texas       January     Color        Red
5   New York    January     Color        Blue
6   California  January     Color        Green
7   Alaska      January     Color        Yellow
8   Texas       February    Temperature  15
9   New York    February    Temperature  25
10  California  February    Temperature  35
11  Alaska      February    Temperature  NaN
12  Texas       February    Color        NaN
13  New York    February    Color        Purple
14  California  February    Color        Orange
15  Alaska      February    Color        brown

我试图将度量值“转动”到列中.最终目标是这样的结果:

Location    Month     Temperature   Color
Texas       January   10            Red
New York    January   20            Blue
California  January   30            Green
Alaska      January   40            Yellow
Texas       February  15    
New York    February  25            Purple
California  February  35            Orange
Alaska      February                brown

我尝试过使用pivot,pivot_table以及unstack方法,但我确定我错过了一些东西.许多复杂性似乎是因为我将字符串与数字混合在一起,并且数据中也有一些缺失值.

这是我迄今为止能够获得的最接近的,但我不希望月份列有额外的行,从而产生更多空白值:

df.set_index(['Location','Month','Metric'], append=True, inplace=True)
df.unstack()

    Value
    Metric              Color   Temperature
    Location    Month       
0   Texas       January None    10
1   New York    January None    20
2   California  January None    30
3   Alaska      January None    40
4   Texas       January Red     None
5   New York    January Blue    None
6   California  January Green   None
7   Alaska      January Yellow  None

这里的任何帮助将不胜感激.这似乎很可能有一个简单的解决方案.

解决方法:

满足您需求的枢纽解决方案.输出是你想要的语义 –

Metric                Color Temperature
Location   Month                       
Alaska     February   brown         NaN
           January   Yellow          40
California February  Orange          35
           January    Green          30
New York   February  Purple          25
           January     Blue          20
Texas      February     NaN          15
           January      Red          10

代码

df_p = df.pivot_table(index=['Location', 'Month'], columns=['Metric'], values='Value', aggfunc=np.sum)

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。

相关推荐