pandas中merage,join和concat

merage主要为横向连接,用于将多个dataframe通过某个相同的键合并为一个;

concat可横向可纵向

一.merage

二.concat

1).方法原型

pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
          keys=None, levels=None, names=None, verify_integrity=False,
          copy=True)
objs:待合并的对象集合,可以是Serice,Dataframe

axis:{0,1,...}合并方向,默认为0,表示纵向,1表示横向

join:{inner,outer}:合并方式,默认为outer,表示并集,inner表示交集

join_axes:按哪些对象的索引保存

ignore_index:{False,True},是否忽略原index,默认为不忽略

keys:为原始dataframe添加一个键,默认为无

1. result = pd.concat(frames)
_images/merging_concat_basic.png

2. result = pd.concat(frames, keys=['x', 'y', 'z'])
_images/merging_concat_keys.png

result.ix['y']
Out[7]: 
    A   B   C   D
4  A4  B4  C4  D4
5  A5  B5  C5  D5
6  A6  B6  C6  D6
7  A7  B7  C7  D7
3.result = pd.concat([df1, df4], axis=1)
_images/merging_concat_axis1.png

 4.result = pd.concat([df1, df4], axis=1, join='inner')
_images/merging_concat_axis1_inner.png

5.result = pd.concat([df1, df4], axis=1, join_axes=[df1.index])
_images/merging_concat_axis1_join_axes.png

 6.result = pd.concat([df1, s1], axis=1)
_images/merging_concat_mixed_ndim.png

 7.result = pd.concat([df1, s2, s2, s2], axis=1)
_images/merging_concat_unnamed_series.png

8.result = pd.concat([df1, s1], axis=1, ignore_index=True)
_images/merging_concat_series_ignore_index.png

9.result = pd.concat(frames, keys=['x', 'y', 'z'])
_images/merging_concat_group_keys2.png

二.append横向和纵向同时扩充,不考虑columns和index

1.result = df1.append(df2)
_images/merging_append1.png

 2.result = df1.append(df4)
_images/merging_append2.png

3.result = df1.append([df2, df3])
_images/merging_append3.png

4.result = df1.append(df4, ignore_index=True)
_images/merging_append_ignore_index.png

三.merage

merge 函数通过一个或多个键来将数据集的行连接起来。该函数的主要 应用场景是针对同一个主键存在两张包含不同特征的表,通过该主键的连接,将两张表进行合并。合并之后,两张表的行数没有增加,列数是两张表的列数之和减一。 

pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,
         left_index=False, right_index=False, sort=True,
         suffixes=('_x', '_y'), copy=True, indicator=False)
left:待合并dataframe1

right:待合并dataframe2

  • on=None: 指定连接的列名,若两列希望连接的列名不一样,可以通过left_on和right_on 来具体指定,不指定时pandas会自动找到相同名字的列
how=’inner’,参数指的是左右两个表主键那一列中存在不重合的行时,取结果的方式:inner表示交集,outer 表示并集,left 和right 表示取某一边。 

1. result = pd.merge(left, right, on='key')
_images/merging_merge_on_key.png

2. result = pd.merge(left, right, on=['key1', 'key2'])
_images/merging_merge_on_key_multiple.png

 3.result = pd.merge(left, right, how='left', on=['key1', 'key2'])
_images/merging_merge_on_key_left.png

4. result = pd.merge(left, right, how='right', on=['key1', 'key2'])
_images/merging_merge_on_key_right.png

5. result = pd.merge(left, right, how='outer', on=['key1', 'key2'])
_images/merging_merge_on_key_outer.png

6. result = pd.merge(left, right, how='inner', on=['key1', 'key2'])
_images/merging_merge_on_key_inner.png