Python – 熊猫慢.想要在DataFrame中首次出现

我有一个人的数据框架.此DataFrame中的一列是place_id.我还有一个地方的DataFrame,其中一列是place_id,另一列是天气.对于每个人,我都在努力寻找相应的天气.重要的是,许多人拥有相同的place_ids.

目前,我的设置如下：

def place_id_to_weather(pid):
    return place_df[place_df['place_id'] == pid]['weather'].item() 

person_df['weather'] = person_df['place_id'].map(place_id_to_weather)`

但这是无法缓慢的.我想加快速度.我怀疑我可以达到这样的加速：

而不是返回place_df […].item(),它为整个列搜索place_id == pid并返回一个系列,然后抓住该系列中的第一个项目,我真的只想缩减搜索在第一场比赛后的place_df中找到place_df [‘place_id’] == pid.在那之后,我不需要再搜索了.如何将搜索限制为仅首次出现？

我可以使用其他方法来实现加速吗？某种连接类型的方法？

解决方法:

我认为drop_duplicates需要drop_duplicates,如果两个DataFrame中只有常见的列place_id和weather,你可以省略参数on(它取决于数据,可能on =’place_id’是必要的)：

df1 = place_df.drop_duplicates(['place_id'])
print (df1)

print (pd.merge(person_df, df1))

样本数据：

person_df = pd.DataFrame({'place_id':['s','d','f','s','d','f'],
                          'A':[4,5,6,7,8,9]})
print (person_df)
   A place_id
0  4        s
1  5        d
2  6        f
3  7        s
4  8        d
5  9        f

place_df = pd.DataFrame({'place_id':['s','d','f', 's','d','f'],
                         'weather':['y','e','r', 'h','u','i']})
print (place_df)
  place_id weather
0        s       y
1        d       e
2        f       r
3        s       h
4        d       u
5        f       i

def place_id_to_weather(pid):
    #for first occurence add iloc[0]
    return place_df[place_df['place_id'] == pid]['weather'].iloc[0]

person_df['weather'] = person_df['place_id'].map(place_id_to_weather)
print (person_df)
   A place_id weather
0  4        s       y
1  5        d       e
2  6        f       r
3  7        s       y
4  8        d       e
5  9        f       r

#keep='first' is by default, so can be omit
print (place_df.drop_duplicates(['place_id']))
  place_id weather
0        s       y
1        d       e
2        f       r

print (pd.merge(person_df, place_df.drop_duplicates(['place_id'])))
   A place_id weather
0  4        s       y
1  7        s       y
2  5        d       e
3  8        d       e
4  6        f       r
5  9        f       r

Python – 熊猫慢.想要在DataFrame中首次出现

相关推荐