Python任务调度模块 – APScheduler

时间 2019-11-06

标签 python 任务调度模块 apscheduler 栏目 Python 繁體版

原文原文链接

APScheduler是一个Python定时任务框架，使用起来十分方便。提供了基于日期、固定时间间隔以及crontab类型的任务，而且能够持久化任务、并以daemon方式运行应用。目前最新版本为3.0.x。
在APScheduler中有四个组件：
触发器(trigger)包含调度逻辑，每个做业有它本身的触发器，用于决定接下来哪个做业会运行。除了他们本身初始配置意外，触发器彻底是无状态的。
做业存储(job store)存储被调度的做业，默认的做业存储是简单地把做业保存在内存中，其余的做业存储是将做业保存在数据库中。一个做业的数据讲在保存在持久化做业存储时被序列化，并在加载时被反序列化。调度器不能分享同一个做业存储。
执行器(executor)处理做业的运行，他们一般经过在做业中提交制定的可调用对象到一个线程或者进城池来进行。看成业完成时，执行器将会通知调度器。
调度器(scheduler)是其余的组成部分。你一般在应用只有一个调度器，应用的开发者一般不会直接处理做业存储、调度器和触发器，相反，调度器提供了处理这些的合适的接口。配置做业存储和执行器能够在调度器中完成，例如添加、修改和移除做业。
你须要选择合适的调度器，这取决于你的应用环境和你使用APScheduler的目的。一般最经常使用的两个：
– BlockingScheduler: 当调度器是你应用中惟一要运行的东西时使用。
– BackgroundScheduler: 当你不运行任何其余框架时使用，并但愿调度器在你应用的后台执行。
安装APScheduler很是简单：
pip install apscheduler
选择合适的做业存储，你须要决定是否须要做业持久化。若是你老是在应用开始时重建job，你能够直接使用默认的做业存储（MemoryJobStore).可是若是你须要将你的做业持久化，以免应用崩溃和调度器重启时，你能够根据你的应用环境来选择具体的做业存储。例如：使用Mongo或者SQLAlchemyJobStore （用于支持大多数RDBMS）
然而，调度器的选择一般是为你若是你使用上面的框架之一。然而，默认的ThreadPoolExecutor 一般用于大多数用途。若是你的工做负载中有较大的CPU密集型操做，你能够用ProcessPoolExecutor来使用更多的CPU核。你也能够在同一时间使用二者，将进程池调度器做为第二执行器。html

配置调度器

APScheduler提供了许多不一样的方式来配置调度器，你可使用一个配置字典或者做为参数关键字的方式传入。你也能够先建立调度器，再配置和添加做业，这样你能够在不一样的环境中获得更大的灵活性。
下面是一个简单使用BlockingScheduler，并使用默认内存存储和默认执行器。(默认选项分别是MemoryJobStore和ThreadPoolExecutor，其中线程池的最大线程数为10)。配置完成后使用start()方法来启动。git

from apscheduler.schedulers.blocking import BlockingSchedulermongodb

def my_job():数据库

print 'hello world'express

sched = BlockingScheduler()框架

sched.add_job(my_job, 'interval', seconds=5)ide

sched.start()函数

在运行程序5秒后，将会输出第一个Hello world。
下面进行一个更复杂的配置，使用两个做业存储和两个调度器。在这个配置中，做业将使用mongo做业存储，信息写入到MongoDB中。ui

from pymongo import MongoClientspa

from apscheduler.schedulers.blocking import BlockingScheduler

from apscheduler.jobstores.mongodb import MongoDBJobStore

from apscheduler.jobstores.memory import MemoryJobStore

from apscheduler.executors.pool import ThreadPoolExecutor, ProcessPoolExecutor

def my_job():

print 'hello world'

host = '127.0.0.1'

port = 27017

client = MongoClient(host, port)

jobstores = {

'mongo': MongoDBJobStore(collection='job', database='test', client=client),

'default': MemoryJobStore()

}

executors = {

'default': ThreadPoolExecutor(10),

'processpool': ProcessPoolExecutor(3)

}

job_defaults = {

'coalesce': False,

'max_instances': 3

}

scheduler = BlockingScheduler(jobstores=jobstores, executors=executors, job_defaults=job_defaults)

scheduler.add_job(my_job, 'interval', seconds=5)

try:

scheduler.start()

except SystemExit:

client.close()

查询MongoDB能够看到做业的运行状况以下：

{

"_id" : "55ca54ee4bb744f8a5ab08cc4319bc24",

"next_run_time" : 1434017278.797,

"job_state" : new BinData(0,"gAJ9cQEoVQRhcmdzcQIpVQhleGVjdXRvcnEDVQdkZWZhdWx0cQRVDW1heF9pbnN0YW5jZXNxBUsDVQRmdW5jcQZVD19fbWFpbl9fOm15X2pvYnEHVQJpZHEIVSA1NWNhNTRlZTRiYjc0NGY4YTVhYjA4Y2M0MzE5YmMyNHEJVQ1uZXh0X3J1bl90aW1lcQpjZGF0ZXRpbWUKZGF0ZXRpbWUKcQtVCgffBgsSBzoMKUhjcHl0egpfcApxDChVDUFzaWEvU2hhbmdoYWlxDU2AcEsAVQNDU1RxDnRScQ+GUnEQVQRuYW1lcRFVBm15X2pvYnESVRJtaXNmaXJlX2dyYWNlX3RpbWVxE0sBVQd0cmlnZ2VycRRjYXBzY2hlZHVsZXIudHJpZ2dlcnMuaW50ZXJ2YWwKSW50ZXJ2YWxUcmlnZ2VyCnEVKYFxFn1xF1UPaW50ZXJ2YWxfbGVuZ3RocRhHQBQAAAAAAABzfXEZKFUIdGltZXpvbmVxGmgMKGgNTehxSwBVA0xNVHEbdFJxHFUIaW50ZXJ2YWxxHWNkYXRldGltZQp0aW1lZGVsdGEKcR5LAEsFSwCHUnEfVQpzdGFydF9kYXRlcSBoC1UKB98GCxIHIQwpSGgPhlJxIVUIZW5kX2RhdGVxIk51hmJVCGNvYWxlc2NlcSOJVQd2ZXJzaW9ucSRLAVUGa3dhcmdzcSV9cSZ1Lg==")

}

操做做业

1. 添加做业

上面是经过add_job()来添加做业，另外还有一种方式是经过scheduled_job()修饰器来修饰函数。

@sched.scheduled_job('cron', id='my_job_id', day='last sun')

def some_decorated_task():

print("I am printed at 00:00:00 on the last Sunday of every month!")

2. 移除做业

job = scheduler.add_job(myfunc, 'interval', minutes=2)

job.remove()

Same, using an explicit job ID:

scheduler.add_job(myfunc, 'interval', minutes=2, id='my_job_id')

scheduler.remove_job('my_job_id')

3. 暂停和恢复做业

暂停做业:
– apscheduler.job.Job.pause()
– apscheduler.schedulers.base.BaseScheduler.pause_job()
恢复做业:
– apscheduler.job.Job.resume()
– apscheduler.schedulers.base.BaseScheduler.resume_job()

4. 得到job列表

得到调度做业的列表，可使用get_jobs()来完成，它会返回全部的job实例。或者使用print_jobs()来输出全部格式化的做业列表。

5. 修改做业

def some_decorated_task():

print("I am printed at 00:00:00 on the last Sunday of every month!")</pre>

6. 关闭调度器

默认状况下调度器会等待全部正在运行的做业完成后，关闭全部的调度器和做业存储。若是你不想等待，能够将wait选项设置为False。

scheduler.shutdown()

scheduler.shutdown(wait=False)

做业运行的控制

add_job的第二个参数是trigger，它管理着做业的调度方式。它能够为date, interval或者cron。对于不一样的trigger，对应的参数也相同。

(1). cron定时调度

year (int|str) – 4-digit year
month (int|str) – month (1-12)
day (int|str) – day of the (1-31)
week (int|str) – ISO week (1-53)
day_of_week (int|str) – number or name of weekday (0-6 or mon,tue,wed,thu,fri,sat,sun)
hour (int|str) – hour (0-23)
minute (int|str) – minute (0-59)
second (int|str) – second (0-59)
start_date (datetime|str) – earliest possible date/time to trigger on (inclusive)
end_date (datetime|str) – latest possible date/time to trigger on (inclusive)
timezone (datetime.tzinfo|str) – time zone to use for the date/time calculations (defaults to scheduler timezone)
和Linux的Crontab同样，它的值格式为：

Expression Field Description

* any Fire on every value

*/a any Fire every a values, starting from the minimum

a-b any Fire on any value within the a-b range (a must be smaller than b)

a-b/c any Fire every c values within the a-b range

xth y day Fire on the x -th occurrence of weekday y within the month

last x day Fire on the last occurrence of weekday x within the month

last day Fire on the last day within the month

x,y,z any Fire on any matching expression; can combine any number of any of the above expressions

例子：

# Schedules job_function to be run on the third Friday

# of June, July, August, November and December at 00:00, 01:00, 02:00 and 03:00

sched.add_job(job_function, 'cron', month='6-8,11-12', day='3rd fri', hour='0-3')

# Runs from Monday to Friday at 5:30 (am) until 2014-05-30 00:00:00

sched.add_job(job_function, 'cron', day_of_week='mon-fri', hour=5, minute=30, end_date='2014-05-30')

(2). interval 间隔调度

它的参数以下：
weeks (int) – number of weeks to wait
days (int) – number of days to wait
hours (int) – number of hours to wait
minutes (int) – number of minutes to wait
seconds (int) – number of seconds to wait
start_date (datetime|str) – starting point for the interval calculation
end_date (datetime|str) – latest possible date/time to trigger on
timezone (datetime.tzinfo|str) – time zone to use for the date/time calculations
例子：

# Schedule job_function to be called every two hours

sched.add_job(job_function, 'interval', hours=2)

(3). date 定时调度

最基本的一种调度，做业只会执行一次。它的参数以下：
run_date (datetime|str) – the date/time to run the job at
timezone (datetime.tzinfo|str) – time zone for run_date if it doesn’t have one already
例子：

# The job will be executed on November 6th, 2009

sched.add_job(my_job, 'date', run_date=date(2009, 11, 6), args=['text'])

# The job will be executed on November 6th, 2009 at 16:30:05

sched.add_job(my_job, 'date', run_date=datetime(2009, 11, 6, 16, 30, 5), args=['text'])

Ref:
http://apscheduler.readthedocs.org/en/latest/modules/triggers
http://apscheduler.readthedocs.org/en/3.0/userguide.html