背景:有20万行的数据,须要按照1万5行进行拆分;python
核心:借助余数拆分组,再取子数据集;mysql
示例:sql
步骤一取余数

步骤二:取子数据集

步骤三:保存数据集

附上代码df拆分红小文件代码:微信
df
chunk_size = 2 ## 设置切分大小
df['chunk'] = df.index // chunk_size
#df
list_chunk = df['chunk'].unique().tolist()
#list_chunk
table =df
#保存到多个excel文件中
for name in list_chunk:
path = '/Users/zhoujunqing/Downloads/EXCEL'
file_name = str(name) + 'file.xlsx'
file_path = path +'/'+file_name
print(file_path)
df_chunk = table[table['chunk'] == name]
writer = pd.ExcelWriter(file_path, engine='xlsxwriter')
df_chunk.to_excel(writer,str(name),index=False)
writer.save()
补充读取excel文件代码:spa
import re
import pandas as pd
from datetime import datetime
import time
def read_xlsx(path,sheet_name):
xlsx_file = pd.ExcelFile(path) ##路径
table = xlsx_file.parse(sheet_name) ##选取表
return table
if __name__ == "__main__":
start_time = time.time() # 开始时间
path = '/Users/xxx/Public'
path = '/Users/xxx/Downloads'
file_name ='test.xlsx'
#file_name ='报名记录汇总v1.xlsx'
sheet_name_list = {
'hive':'动态id',
'mysql':'Sheet4',
'excel':'工做表4',
'xlsx':'Sheet1'
}
path = path+"/"+file_name
sheet_name = sheet_name_list['excel']
#sheet_name = sheet_name_list['email']
df = read_xlsx(path,sheet_name)
print(df.head())
end_time = time.time() #结束时间
print("程序耗时%f秒." % (end_time - start_time))
本文分享自微信公众号 - SQL数据分析(dianwu_dw)。
若有侵权,请联系 support@oschina.cn 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一块儿分享。.net