时间序列类型 时间戳:特定时刻 固定时期:如2017年7月 时间间隔:起始时间-结束时间 Python标准库处理时间对象:datetime 灵活处理时间对象:dateutil dateutil.parser.parse() 成组处理时间对象:pandas pd.to_datetime()
import datetime
import pandas as pd
import numpy as np
datetime.datetime.strptime('2010-01-01','%Y-%m-%d')
datetime.datetime.strptime('2010/01/01','%Y/%m/%d')
import dateutil
dateutil.parser.parse('03/08/2020 14:35')
dateutil.parser.parse('2020-Mar-8')
pd.to_datetime(['2001-01-01','2020/Mar/08'])
pandas-时间对象处理javascript
产生时间对象数组 | pd.date_range |
---|---|
start | 开始时间 |
end | 结束时间 |
periods | 时间长度 |
freq | 时间频率,默认为D,可选Hour,Week,Business,Sem,Month,(min)T(es),S(econd),A(year) |
pd.date_range('2019/7/23','2021/7/23')
pd.date_range('2019-7-23',periods=720)
pd.date_range('2019/7/23',periods=30,freq='M')
pd.date_range('2019-7-23',periods=30,freq='W-MON')
B business day frequency C custom business day frequency (experimental) D calendar day frequency W weekly frequency M month end frequency SM semi-month end frequency (15th and end of month) BM business month end frequency CBM custom business month end frequency MS month start frequency SMS semi-month start frequency (1st and 15th) BMS business month start frequency CBMS custom business month start frequency Q quarter end frequency BQ business quarter endfrequency QS quarter start frequency BQS business quarter start frequency A year end frequency BA business year end frequency AS year start frequency BAS business year start frequency BH business hour frequency H hourly frequency T, min minutely frequency S secondly frequency L, ms milliseconds U, us microseconds N nanoseconds
pd.date_range('2019-7-23',periods=60,freq='B') #B Business Day
dt = _
dt[0]
dt[0].to_pydatetime()
时间序列就是以时间对象为索引的 Series 或 Dataframe。 datetime对象做为索引时是存储在 DatetimeIndex对象中的。 时间序列特殊功能 传入“年”或“年月”做为切片方式 传入日期范围做为切片方式 丰富的函数支持:resample, truncate,
sr = pd.Series(np.arange(100),index=pd.date_range('2020-3-8',periods=100))
sr
sr.index
sr['2020-3']
sr['2020-3':'2020-4']
sr.resample('W').sum()
sr.resample('M').sum()
sr.resample('M').mean()
sr.truncate(before='2020-4-1')
数据文件经常使用格式:csv(以某间隔符分割数据) pandas读取文件:从文件名、URL、文件对象中加载数据 read_csv 默认分隔符为逗号 read_table 默认分隔符为制表符
read_csv、read_table | 函数主要参数: |
---|---|
sep | 指定分隔符,可用正则表达式如'\s+' |
header=None | 指定文件无列名 |
name | 指定列名 |
index_col | 指定某列做为索引 |
skip_row | 指定跳过某些行 |
na_values | 指定某些字符串表示缺失值 |
parse_dates | 指定某些列是否被解析为日期,类型为布尔值或列表 |
pd.read_csv('600519.csv')
pd.read_csv('600519.csv',index_col=0)
pd.read_csv('600519.csv',index_col='date')
df = pd.read_csv('600519.csv',index_col='date')
df.index[0]
df.index
pd.read_csv('600519.csv',index_col='date',parse_dates=True).index
pd.read_csv('600519.csv',index_col='date',parse_dates=['date']).index
pd.read_csv('600519.csv',header=None,names=list('abcdefgh'))
pd.read_csv('600519.csv',header=None,skiprows=[1,2,3])
pd.read_csv('600519.csv',header=None,skiprows=[1,2,3],na_values=['None'])