merge( )合并须要指定链接键。web
In [5]: df1=pd.DataFrame({'key':['b','b','a','a','b','a','c'],'data1':range(7)}) In [6]: df2=pd.DataFrame({'key':['a','b','d'],'data2':range(3)}) In [7]: df1 Out[7]: data1 key 0 0 b 1 1 b 2 2 a 3 3 a 4 4 b 5 5 a 6 6 c In [8]: df2 Out[8]: data2 key 0 0 a 1 1 b 2 2 d In [9]: pd.merge(df1,df2,on='key') Out[9]: data1 key data2 0 0 b 1 1 1 b 1 2 4 b 1 3 2 a 0 4 3 a 0 5 5 a 0
In [4]: df7=pd.DataFrame({'key1':['b','b','a','a','b','a','c'],'key2':['i','j','k','k','i','j','k'],'data1':range(7)}) In [5]: df8=pd.DataFrame({'key1':['a','b','d'],'key2':['k','j','i'],'data2':range(3)}) In [6]: df7 Out[6]: key1 key2 data1 0 b i 0 1 b j 1 2 a k 2 3 a k 3 4 b i 4 5 a j 5 6 c k 6 In [7]: df8 Out[7]: key1 key2 data2 0 a k 0 1 b j 1 2 d i 2 In [8]: pd.merge(df7,df8,on=['key1','key2']) Out[8]: key1 key2 data1 data2 0 b j 1 1 1 a k 2 0 2 a k 3 0
分别指明左右两侧的链接键数据结构
In [11]: df3=pd.DataFrame({'l_key':['b','b','a','a','b','a','c'],'data1':range(7)}) In [12]: df4=pd.DataFrame({'r_key':['a','b','d'],'data2':range(3)}) In [13]: pd.merge(df3,df4,left_on='l_key',right_on='r_key') Out[13]: data1 l_key data2 r_key 0 0 b 1 b 1 1 b 1 b 2 4 b 1 b 3 2 a 0 a 4 3 a 0 a 5 5 a 0 a
In [15]: df2=pd.DataFrame({'key':['a','b','d'],'data2':range(3)}) In [16]: pd.merge(df1,df2,on='key',how='outer') Out[16]: data1 key data2 0 0.0 b 1.0 1 1.0 b 1.0 2 4.0 b 1.0 3 2.0 a 0.0 4 3.0 a 0.0 5 5.0 a 0.0 6 6.0 c NaN 7 NaN d 2.0
只使用左边(或右边)中的DataFrame的键svg
In [17]: pd.merge(df1,df2,on='key',how='left') Out[17]: data1 key data2 0 0 b 1.0 1 1 b 1.0 2 2 a 0.0 3 3 a 0.0 4 4 b 1.0 5 5 a 0.0 6 6 c NaN In [18]: pd.merge(df1,df2,on='key',how='right') Out[18]: data1 key data2 0 0.0 b 1 1 1.0 b 1 2 4.0 b 1 3 2.0 a 0 4 3.0 a 0 5 5.0 a 0 6 NaN d 2
In [24]: df7=pd.DataFrame({'key':['a','b','a','a','b','c'],'value':range(6)}) In [25]: df8=pd.DataFrame({'group_val':[3.5,7]},index=['a','b']) In [26]: df7 Out[26]: key value 0 a 0 1 b 1 2 a 2 3 a 3 4 b 4 5 c 5 In [27]: df8 Out[27]: group_val a 3.5 b 7.0 In [28]: pd.merge(df7,df8,left_on='key',right_index=True) Out[28]: key value group_val 0 a 0 3.5 2 a 2 3.5 3 a 3 3.5 1 b 1 7.0 4 b 4 7.0
产生的是行的笛卡尔积,因为左边的DataFrame有3个"b"行,右边的有两个,因此最终结果就有6个“b”行函数
In [19]: df5=pd.DataFrame({'key':['b','b','a','c','a','b'],'data1':range(6)}) In [20]: df6=pd.DataFrame({'key':['a','b','a','b','d'],'data2':range(5)}) In [21]: df5 Out[21]: data1 key 0 0 b 1 1 b 2 2 a 3 3 c 4 4 a 5 5 b In [22]: df6 Out[22]: data2 key 0 0 a 1 1 b 2 2 a 3 3 b 4 4 d In [23]: pd.merge(df5,df6,how='outer') Out[23]: data1 key data2 0 0.0 b 1.0 1 0.0 b 3.0 2 1.0 b 1.0 3 1.0 b 3.0 4 5.0 b 1.0 5 5.0 b 3.0 6 2.0 a 0.0 7 2.0 a 2.0 8 4.0 a 0.0 9 4.0 a 2.0 10 3.0 c NaN 11 NaN d 4.0
参数 | 说明 |
---|---|
left | 参与合并的左侧DataFrame |
right | 参与合并的右侧DataFrame |
how | “inner”,“outer”,“left”,“right"其中之一,默认为"inner” |
on | 用于链接的列名,必须存在于左右两个DataFrame |
left_on | 左侧DataFrame中用做链接键的列 |
right_on | 右侧DataFrame中用做链接键的列 |
left_index | 将左侧的行索引用做其链接键 |
right_index | 将右侧的行索引用做其链接键 |
sort | 根据链接键对合并后的数据进行排列,默认为True |
suffixes | 字符串值元组,用于追加到重叠列名的末尾,默认为(’_x’,‘_y’)。若是左右两个DataFrame对象都有“data”,则结果就会出现“data_x”和“data_y” |
copy | 默认为True。若是设置为False,能够避免将数据复制到结果数据结构中 |