python – 如何将多个变量传递给pandas数据帧,以便将它们与.map一起使用来创建新列

要将多个变量传递给普通的python 函数,您可以编写如下内容：

def a_function(date,string,float):
      do something....
      convert string to int, 
      date = date + (float * int) days
      return date

使用Pandas数据帧时,我知道您可以根据以下内容创建新列：

df['new_col']) = df['column_A'].map(a_function)
For example this might return the year from a date column
return date.year

我想知道的是,您可以将多个数据传递给单个函数(如上面的第一个示例所示),您是否可以在创建新的pandas dataframe列时使用多个列？

例如,将日期Y-M-D的三个独立部分组合成一个字段.

df['whole_date']) = df['Year','Month','Day'].map(a_function)

我通过以下测试得到了一个关键错误.

def combine(one,two,three):
return one + two + three

df = pd.DataFrame({'a': [1,2,3], 'b': [2,3,4],'c': [4,5,6]})

df['d'] = df['a','b','b'].map(combine)

有没有办法使用.MAP或其他东西在pandas数据框中创建一个新列,它将三列作为输入并返回一列.例如,输入将是1,2,3,输出将是1 * 2 * 3

同样,还有一种方法是让函数接受一个参数,一个日期并返回三个新的pandas dataframe列;年,月,日一个？

解决方法:

Is there a way of creating a new column in a pandas dataframe using .MAP or something else which takes as input three columns and returns a single column. For example input would be 1, 2, 3 and output would be 1*2*3

为此,您可以使用apply with axis = 1.但是,不是使用三个单独的参数(每列一个)调用,而是使用每个行的单个参数调用指定的函数,该参数将是包含该行数据的Series.你可以在你的功能中考虑到这一点：

def combine(row):
    return row['a'] + row['b'] + row['c']

>>> df.apply(combine, axis=1)
0     7
1    10
2    13

或者你可以传递一个lambda,它将Series解包为单独的参数：

def combine(one,two,three):
    return one + two + three

>>> df.apply(lambda x: combine(*x), axis=1)
0     7
1    10
2    13

如果只想传递特定的行,则需要通过使用列表在DataFrame上建立索引来选择它们：

>>> df[['a', 'b', 'c']].apply(lambda x: combine(*x), axis=1)
0     7
1    10
2    13

请注意双括号. (这与apply实际上没有任何关系;使用列表索引是从DataFrame访问多个列的常规方法.)

但是,重要的是要注意,在许多情况下,您不需要使用apply,因为您可以在列本身上使用向量化操作.上面的combine函数可以简单地使用DataFrame列本身作为参数调用：

>>> combine(df.a, df.b, df.c)
0     7
1    10
2    13

当“组合”操作是可矢量化时,这通常更有效.

Likewise is there also a way of having a function take in one argument, a date and return three new pandas dataframe columns; one for the year, month and day?

如上所述,有两种基本方法可以做到这一点：使用apply的一般但非向量化的方式,以及更快的矢量化方式.假设你有一个像这样的DataFrame：

>>> df = pandas.DataFrame({'date': pandas.date_range('2015/05/01', '2015/05/03')})
>>> df
        date
0 2015-05-01
1 2015-05-02
2 2015-05-03

您可以定义一个为每个值返回Series的函数,然后将其应用于该列：

def dateComponents(date):
    return pandas.Series([date.year, date.month, date.day], index=["Year", "Month", "Day"])

>>> df.date.apply(dateComponents)
11:    Year  Month  Day
0  2015      5    1
1  2015      5    2
2  2015      5    3

在这种情况下,这是唯一的选择,因为没有矢量化方式来访问各个日期组件.但是,在某些情况下,您可以使用矢量化操作：

>>> df = pandas.DataFrame({'a': ["Hello", "There", "Pal"]})
>>> df
        a
0  Hello
1  There
2    Pal

>>> pandas.DataFrame({'FirstChar': df.a.str[0], 'Length': df.a.str.len()})
   FirstChar  Length
0         H       5
1         T       5
2         P       3

这里再次通过直接操作值而不是元素地应用函数来对操作进行矢量化.在这种情况下,我们有两个向量化操作(获取第一个字符并获取字符串长度),然后我们将结果包装在另一个DataFrame调用中,为两种结果中的每一种创建单独的列.

python – 如何将多个变量传递给pandas数据帧,以便将它们与.map一起使用来创建新列

相关推荐