Pandas | 20 级联

Pandas提供了各类工具(功能),能够轻松地将SeriesDataFramePanel对象组合在一块儿。python

pd.concat(objs,axis=0,join='outer',join_axes=None,ignore_index=False)
  • objs - 这是Series,DataFrame或Panel对象的序列或映射。
  • axis - {0,1,...},默认为0,这是链接的轴。
  • join - {'inner', 'outer'},默认inner。如何处理其余轴上的索引。联合的外部和交叉的内部。
  • ignore_index − 布尔值,默认为False。若是指定为True,则不要使用链接轴上的索引值。结果轴将被标记为:0,...,n-1
  • join_axes - 这是Index对象的列表。用于其余(n-1)轴的特定索引,而不是执行内部/外部集逻辑。

链接对象

concat()函数完成了沿轴执行级联操做的全部重要工做。下面代码中,建立不一样的对象并进行链接。shell

import pandas as pd

one = pd.DataFrame({
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5'],
         'Marks_scored':[98,90,87,69,78]},
         index=[1,2,3,4,5])

two = pd.DataFrame({
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5'],
         'Marks_scored':[89,80,79,97,88]},
         index=[1,2,3,4,5])

rs = pd.concat([one,two])
print(rs)

输出结果:app

Marks_scored Name subject_id 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5
 

假设想把特定的键与每一个碎片的DataFrame关联起来。能够经过使用键参数来实现这一点 -函数

import pandas as pd

one = pd.DataFrame({
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5'],
         'Marks_scored':[98,90,87,69,78]},
         index=[1,2,3,4,5])

two = pd.DataFrame({
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5'],
         'Marks_scored':[89,80,79,97,88]},
         index=[1,2,3,4,5])

rs = pd.concat([one,two],keys=['x','y'])
print(rs)

输出结果:工具

Marks_scored Name subject_id x 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 y 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5
 

结果的索引是重复的; 每一个索引重复。若是想要生成的对象必须遵循本身的索引,请将ignore_index设置为True。参考如下示例代码 -spa

import pandas as pd

one = pd.DataFrame({
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5'],
         'Marks_scored':[98,90,87,69,78]},
         index=[1,2,3,4,5])

two = pd.DataFrame({
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5'],
         'Marks_scored':[89,80,79,97,88]},
         index=[1,2,3,4,5])

rs = pd.concat([one,two],keys=['x','y'],ignore_index=True)
print(rs)
输出结果:
Marks_scored Name subject_id 0 98 Alex sub1 1 90 Amy sub2 2 87 Allen sub4 3 69 Alice sub6 4 78 Ayoung sub5 5 89 Billy sub2 6 80 Brian sub4 7 79 Bran sub3 8 97 Bryce sub6 9 88 Betty sub5
 

观察,索引彻底改变,键也被覆盖。若是须要沿axis=1添加两个对象,则会添加新列。code

import pandas as pd

one = pd.DataFrame({
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5'],
         'Marks_scored':[98,90,87,69,78]},
         index=[1,2,3,4,5])

two = pd.DataFrame({
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5'],
         'Marks_scored':[89,80,79,97,88]},
         index=[1,2,3,4,5])

rs = pd.concat([one,two],axis=1)
print(rs)

输出结果:对象

Marks_scored Name subject_id Marks_scored Name subject_id 1 98 Alex sub1 89 Billy sub2 2 90 Amy sub2 80 Brian sub4 3 87 Allen sub4 79 Bran sub3 4 69 Alice sub6 97 Bryce sub6 5 78 Ayoung sub5 88 Betty sub5
 

使用附加链接

链接的一个有用的快捷方式是在Series和DataFrame实例的append方法。这些方法实际上早于concat()方法。 它们沿axis=0链接,即索引 -blog

import pandas as pd

one = pd.DataFrame({
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5'],
         'Marks_scored':[98,90,87,69,78]},
         index=[1,2,3,4,5])

two = pd.DataFrame({
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5'],
         'Marks_scored':[89,80,79,97,88]},
         index=[1,2,3,4,5])

rs = one.append(two)
print(rs)

输出结果:索引

Marks_scored Name subject_id 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5
 

append()函数也能够带多个对象 -

import pandas as pd

one = pd.DataFrame({
         'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
         'subject_id':['sub1','sub2','sub4','sub6','sub5'],
         'Marks_scored':[98,90,87,69,78]},
         index=[1,2,3,4,5])

two = pd.DataFrame({
         'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
         'subject_id':['sub2','sub4','sub3','sub6','sub5'],
         'Marks_scored':[89,80,79,97,88]},
         index=[1,2,3,4,5])

rs = one.append([two,one,two])
print(rs)

输出结果:

Marks_scored Name subject_id 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5
 

时间序列

Pandas为时间序列数据的工做时间提供了一个强大的工具,尤为是在金融领域。在处理时间序列数据时,咱们常常遇到如下状况 -

  • 生成时间序列
  • 将时间序列转换为不一样的频率

Pandas提供了一个相对紧凑和自包含的工具来执行上述任务。

获取当前时间

datetime.now()用于获取当前的日期和时间。

import pandas as pd

print pd.datetime.now()

输出结果:

2017-11-03 02:17:45.997992
 

建立一个时间戳

时间戳数据是时间序列数据的最基本类型,它将数值与时间点相关联。 对于Pandas对象来讲,意味着使用时间点。举个例子 -

import pandas as pd

time = pd.Timestamp('2018-11-01')
print(time)

输出结果:

2018-11-01 00:00:00
 

也能够转换整数或浮动时期。这些的默认单位是纳秒(由于这些是如何存储时间戳的)。 然而,时代每每存储在另外一个能够指定的单元中。 再举一个例子 -

import pandas as pd

time = pd.Timestamp(1588686880,unit='s')
print(time)

输出结果:

2020-05-05 13:54:40
 

建立一个时间范围

import pandas as pd

time = pd.date_range("12:00", "23:59", freq="30min").time
print(time)
输出结果:
[datetime.time(12, 0) datetime.time(12, 30) datetime.time(13, 0) datetime.time(13, 30) datetime.time(14, 0) datetime.time(14, 30) datetime.time(15, 0) datetime.time(15, 30) datetime.time(16, 0) datetime.time(16, 30) datetime.time(17, 0) datetime.time(17, 30) datetime.time(18, 0) datetime.time(18, 30) datetime.time(19, 0) datetime.time(19, 30) datetime.time(20, 0) datetime.time(20, 30) datetime.time(21, 0) datetime.time(21, 30) datetime.time(22, 0) datetime.time(22, 30) datetime.time(23, 0) datetime.time(23, 30)]
 

改变时间的频率

import pandas as pd

time = pd.date_range("12:00", "23:59", freq="H").time
print(time)

输出结果:

[datetime.time(12, 0) datetime.time(13, 0) datetime.time(14, 0) datetime.time(15, 0) datetime.time(16, 0) datetime.time(17, 0) datetime.time(18, 0) datetime.time(19, 0) datetime.time(20, 0) datetime.time(21, 0) datetime.time(22, 0) datetime.time(23, 0)]
 

转换为时间戳

要转换相似日期的对象(例如字符串,时代或混合)的序列或相似列表的对象,可使用to_datetime函数。当传递时将返回一个Series(具备相同的索引),而相似列表被转换为DatetimeIndex。 看看下面的例子 -

import pandas as pd

time = pd.to_datetime(pd.Series(['Jul 31, 2009','2019-10-10', None]))
print(time)

输出结果:

0 2009-07-31 1 2019-10-10 2 NaT dtype: datetime64[ns]
 

NaT表示不是一个时间的值(至关于NaN)

import pandas as pd
import pandas as pd

time = pd.to_datetime(['2009/11/23', '2019.12.31', None])
print(time)

输出结果:

DatetimeIndex(['2009-11-23', '2019-12-31', 'NaT'], dtype='datetime64[ns]', freq=None)
相关文章
相关标签/搜索