Pandas | 19 合并/链接

Pandas具备功能全面的高性能内存中链接操做,与SQL等关系数据库很是类似。
Pandas提供了一个单独的merge()函数,做为DataFrame对象之间全部标准数据库链接操做的入口 -python

 
pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True)
  • left - 一个DataFrame对象。
  • right - 另外一个DataFrame对象。
  • on - 列(名称)链接,必须在左和右DataFrame对象中存在(找到)。
  • left_on - 左侧DataFrame中的列用做键,能够是列名或长度等于DataFrame长度的数组。
  • right_on - 来自右的DataFrame的列做为键,能够是列名或长度等于DataFrame长度的数组。
  • left_index - 若是为True,则使用左侧DataFrame中的索引(行标签)做为其链接键。 在具备MultiIndex(分层)的DataFrame的状况下,级别的数量必须与来自右DataFrame的链接键的数量相匹配。
  • right_index - 与右DataFrame的left_index具备相同的用法。
  • how - 它是left, right, outer以及inner之中的一个,默认为内inner。 下面将介绍每种方法的用法。
  • sort - 按照字典顺序经过链接键对结果DataFrame进行排序。默认为True,设置为False时,在不少状况下大大提升性能。

如今建立两个不一样的DataFrame并对其执行合并操做。shell

import pandas as pd
left
= pd.DataFrame({ 'id':[1,2,3,4,5], 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right
= pd.DataFrame( {'id':[1,2,3,4,5], 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5']}) print (left) print("========================================") print (right)

输出结果:数据库

Name id subject_id 0 Alex 1 sub1 1 Amy 2 sub2 2 Allen 3 sub4 3 Alice 4 sub6 4 Ayoung 5 sub5 ======================================== Name id subject_id 0 Billy 1 sub2 1 Brian 2 sub4 2 Bran 3 sub3 3 Bryce 4 sub6 4 Betty 5 sub5
 

在一个键上合并两个数据帧数组

import pandas as pd

left = pd.DataFrame({
         'id':[1,2,3,4,5],
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5']})

right = pd.DataFrame(
         {'id':[1,2,3,4,5],
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5']})

rs = pd.merge(left,right,on='id')
print(rs)

输出结果:函数

Name_x id subject_id_x Name_y subject_id_y 0 Alex 1 sub1 Billy sub2 1 Amy 2 sub2 Brian sub4 2 Allen 3 sub4 Bran sub3 3 Alice 4 sub6 Bryce sub6 4 Ayoung 5 sub5 Betty sub5
 

合并多个键上的两个数据框性能

import pandas as pd

left = pd.DataFrame({
         'id':[1,2,3,4,5],
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5']})

right = pd.DataFrame(
         {'id':[1,2,3,4,5],
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5']})

rs = pd.merge(left,right,on=['id','subject_id'])
print(rs)

输出结果:spa

Name_x id subject_id Name_y 0 Alice 4 sub6 Bryce 1 Ayoung 5 sub5 Betty
 

合并使用“how”的参数

如何合并参数指定如何肯定哪些键将被包含在结果表中。若是组合键没有出如今左侧或右侧表中,则链接表中的值将为NAcode

这里是how选项和SQL等效名称的总结 -对象

合并方法 SQL等效 描述
left LEFT OUTER JOIN 使用左侧对象的键
right RIGHT OUTER JOIN 使用右侧对象的键
outer FULL OUTER JOIN 使用键的联合
inner INNER JOIN 使用键的交集

Left Join示例blog

import pandas as pd
left = pd.DataFrame({
         'id':[1,2,3,4,5],
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5']})

right = pd.DataFrame(
         {'id':[1,2,3,4,5],
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5']})

rs = pd.merge(left, right, on='subject_id', how='left')
print (rs)

输出结果:

Name_x id_x subject_id Name_y id_y 0 Alex 1 sub1 NaN NaN 1 Amy 2 sub2 Billy 1.0 2 Allen 3 sub4 Brian 2.0 3 Alice 4 sub6 Bryce 4.0 4 Ayoung 5 sub5 Betty 5.0
 

Right Join示例

import pandas as pd

left = pd.DataFrame({
         'id':[1,2,3,4,5],
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5']})

right = pd.DataFrame(
         {'id':[1,2,3,4,5],
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5']})

rs = pd.merge(left, right, on='subject_id', how='right')
print (rs)

输出结果:

Name_x id_x subject_id Name_y id_y 0 Amy 2.0 sub2 Billy 1 1 Allen 3.0 sub4 Brian 2 2 Alice 4.0 sub6 Bryce 4 3 Ayoung 5.0 sub5 Betty 5 4 NaN NaN sub3 Bran 3
 

Outer Join示例

import pandas as pd

left = pd.DataFrame({
         'id':[1,2,3,4,5],
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5']})

right = pd.DataFrame(
         {'id':[1,2,3,4,5],
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5']})

rs = pd.merge(left, right, how='outer', on='subject_id')
print (rs)

输出结果:

Name_x id_x subject_id Name_y id_y 0 Alex 1.0 sub1 NaN NaN 1 Amy 2.0 sub2 Billy 1.0 2 Allen 3.0 sub4 Brian 2.0 3 Alice 4.0 sub6 Bryce 4.0 4 Ayoung 5.0 sub5 Betty 5.0 5 NaN NaN sub3 Bran 3.0
 

Inner Join示例

链接将在索引上进行。链接(Join)操做将授予它所调用的对象。因此,a.join(b)不等于b.join(a)

import pandas as pd

left = pd.DataFrame({
         'id':[1,2,3,4,5],
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5']})

right = pd.DataFrame(
         {'id':[1,2,3,4,5],
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5']})

rs = pd.merge(left, right, on='subject_id', how='inner')
print (rs)

输出结果:

Name_x id_x subject_id Name_y id_y 0 Amy 2 sub2 Billy 1 1 Allen 3 sub4 Brian 2 2 Alice 4 sub6 Bryce 4 3 Ayoung 5 sub5 Betty 5
相关文章
相关标签/搜索