Pandas学习笔记系列:html
原文:https://morvanzhou.github.io/tutorials/data-manipulation/np-pd/3-6-pd-concat/ 本文有删改python
pandas
处理多组数据的时候每每会要用到数据的合并处理,使用 concat
是一种基本的合并方式.并且concat
中有不少参数能够调整,合并成你想要的数据形式.git
axis=0是预设值,所以未设定任何参数时,函数默认axis=0。github
import pandas as pd import numpy as np #定义资料集 df1 = pd.DataFrame(np.ones((3,4))*0, columns=['a','b','c','d']) df2 = pd.DataFrame(np.ones((3,4))*1, columns=['a','b','c','d']) df3 = pd.DataFrame(np.ones((3,4))*2, columns=['a','b','c','d']) #concat纵向合并 res = pd.concat([df1, df2, df3], axis=0) #打印结果 print(res) """ a b c d 0 0.0 0.0 0.0 0.0 1 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 0 1.0 1.0 1.0 1.0 1 1.0 1.0 1.0 1.0 2 1.0 1.0 1.0 1.0 0 2.0 2.0 2.0 2.0 1 2.0 2.0 2.0 2.0 2 2.0 2.0 2.0 2.0 """
仔细观察会发现结果的index
是0, 1, 2, 0, 1, 2, 0, 1, 2,若要将index
重置,请看下面。app
#承上一个例子,并将index_ignore设定为True res = pd.concat([df1, df2, df3], axis=0, ignore_index=True) #打印结果 print(res) """ a b c d 0 0.0 0.0 0.0 0.0 1 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 3 1.0 1.0 1.0 1.0 4 1.0 1.0 1.0 1.0 5 1.0 1.0 1.0 1.0 6 2.0 2.0 2.0 2.0 7 2.0 2.0 2.0 2.0 8 2.0 2.0 2.0 2.0 """
结果的index变0, 1, 2, 3, 4, 5, 6, 7, 8。函数
join='outer'
为预设值,所以未设定任何参数时,函数默认join='outer'。此方式是依照column来作纵向合并,有相同的column上下合并在一块儿,其余独自的column个自成列,本来没有值的位置皆以NaN填充。学习
import pandas as pd import numpy as np #定义资料集 df1 = pd.DataFrame(np.ones((3,4))*0, columns=['a','b','c','d'], index=[1,2,3]) df2 = pd.DataFrame(np.ones((3,4))*1, columns=['b','c','d','e'], index=[2,3,4]) #纵向"外"合并df1与df2 res = pd.concat([df1, df2], axis=0, join='outer') print(res) """ a b c d e 1 0.0 0.0 0.0 0.0 NaN 2 0.0 0.0 0.0 0.0 NaN 3 0.0 0.0 0.0 0.0 NaN 2 NaN 1.0 1.0 1.0 1.0 3 NaN 1.0 1.0 1.0 1.0 4 NaN 1.0 1.0 1.0 1.0 """
原理同上个例子的说明,但只有相同的column合并在一块儿,其余的会被抛弃。spa
#承上一个例子 #纵向"内"合并df1与df2 res = pd.concat([df1, df2], axis=0, join='inner') #打印结果 print(res) """ b c d 1 0.0 0.0 0.0 2 0.0 0.0 0.0 3 0.0 0.0 0.0 2 1.0 1.0 1.0 3 1.0 1.0 1.0 4 1.0 1.0 1.0 """ #重置index并打印结果 res = pd.concat([df1, df2], axis=0, join='inner', ignore_index=True) print(res) """ b c d 0 0.0 0.0 0.0 1 0.0 0.0 0.0 2 0.0 0.0 0.0 3 1.0 1.0 1.0 4 1.0 1.0 1.0 5 1.0 1.0 1.0 """
import pandas as pd import numpy as np #定义资料集 df1 = pd.DataFrame(np.ones((3,4))*0, columns=['a','b','c','d'], index=[1,2,3]) df2 = pd.DataFrame(np.ones((3,4))*1, columns=['b','c','d','e'], index=[2,3,4]) #依照`df1.index`进行横向合并 res = pd.concat([df1, df2], axis=1, join_axes=[df1.index]) #打印结果 print(res) """ a b c d b c d e 1 0.0 0.0 0.0 0.0 NaN NaN NaN NaN 2 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 3 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 """ #移除join_axes,并打印结果 res = pd.concat([df1, df2], axis=1) print(res) """ a b c d b c d e 1 0.0 0.0 0.0 0.0 NaN NaN NaN NaN 2 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 3 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 4 NaN NaN NaN NaN 1.0 1.0 1.0 1.0 """
append只有纵向合并,没有横向合并。code
import pandas as pd import numpy as np #定义资料集 df1 = pd.DataFrame(np.ones((3,4))*0, columns=['a','b','c','d']) df2 = pd.DataFrame(np.ones((3,4))*1, columns=['a','b','c','d']) df3 = pd.DataFrame(np.ones((3,4))*1, columns=['a','b','c','d']) s1 = pd.Series([1,2,3,4], index=['a','b','c','d']) #将df2合并到df1的下面,以及重置index,并打印出结果 res = df1.append(df2, ignore_index=True) print(res) """ a b c d 0 0.0 0.0 0.0 0.0 1 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 3 1.0 1.0 1.0 1.0 4 1.0 1.0 1.0 1.0 5 1.0 1.0 1.0 1.0 """ #合并多个df,将df2与df3合并至df1的下面,以及重置index,并打印出结果 res = df1.append([df2, df3], ignore_index=True) print(res) """ a b c d 0 0.0 0.0 0.0 0.0 1 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 3 1.0 1.0 1.0 1.0 4 1.0 1.0 1.0 1.0 5 1.0 1.0 1.0 1.0 6 1.0 1.0 1.0 1.0 7 1.0 1.0 1.0 1.0 8 1.0 1.0 1.0 1.0 """ #合并series,将s1合并至df1,以及重置index,并打印出结果 res = df1.append(s1, ignore_index=True) print(res) """ a b c d 0 0.0 0.0 0.0 0.0 1 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 3 1.0 2.0 3.0 4.0 """