如何对你的Python代码进行基准测试

时间 2019-12-04

原文原文链接

啥叫作基准测试(benchmark)代码?其实主要就是测试(benchmarking)和分析(profiling)你的代码执行多快，而且找到代码瓶颈(bottlenecks)在哪里。执行该动做的主要的目的就是优化(optimization)。也许由于业务须要你而且让你的代码执行更快一些。当面临这种状况时，你就须要找出你的代码是在哪些部分慢的。python

本文覆盖如何使用不一样工具进行代码测试。数据库

timeit

Python再带一个模块timeit。你可使用它来对小代码进行计时。timeit模块使用针对平台的时间函数，你能够获得很是准确的一个时间结果。dom

timeit模块有个命令行接口，也能够经过导入的方式。咱们将从命令行中使用开始。打开终端执行以下命令：函数

$ python3 -m timeit -s "[ord(x) for x in 'abcdfghi']" 100000000 loops, best of 3: 0.00907 usec per loop $ python3 -m timeit -s "[chr(int(x)) for x in '123456789']" 100000000 loops, best of 3: 0.00897 usec per loop

发生啥啦？好吧，当你执行Python命令而且传递-m参数时，你是在告诉解释器去找该模块而且做为主要程序执行。 -s告知timeit模块执行一次代码。而后timeit就行执行代码三次，而且返回该三次的平均结果。该例子太过减小，它们之间的区别可能不太明显。工具

你的机器上运行的结果可能跟个人会有一些误差。oop

咱们再写个简单函数，看看是否能够从命令行测试学习

# simple_func.py def my_function(): try: 1 / 0 except ZeroDivisionError: pass

整个函数的功能就是抛出一个异常而后直接忽略。很烂的例子。要经过命令使用timeit来测试，咱们须要在命令空间中导入它，确保工做目录在当前文件目录下，而后执行：测试

$ python3 -m timeit "import simple_func; simple_func.my_function()" 1000000 loops, best of 3: 0.753 usec per loop

这里咱们就导入该模块而且直接my_function()函数。注意咱们使用了;来分来导入和执行代码。如今能够开始学习如何在Python脚本代码内部使用timeit了。优化

导入timeit测试

在代码中使用timeit模块也是至关容易的。咱们仍是使用以前很烂的例子：ui

def my_function(): try: 1 / 0 except ZeroDivisionError: pass if __name__ == "__main__": import timeit setup = "from __main__ import my_function" print(timeit.timeit("my_function()", setup=setup))

咱们执行经过执行该脚原本验证。咱们导入了timeit模块，而后调用timeit函数，参数的setup字符串将导入要测试函数到timeit的做用域中。如今咱们实现本身的装饰器测试函数。

使用装饰器

实现本身的装饰器计时函数是一件颇有趣的事情，虽然不必定有timeit精确。

import random import time def timerfunc(func): """  A timer decorator  """ def function_timer(*args, **kwargs): """  A nested function for timing other functions  """ start = time.time() value = func(*args, **kwargs) end = time.time() runtime = end - start msg = "The runtime for {func} took {time} seconds to complete" print(msg.format(func=func.__name__, time=runtime)) return value return function_timer @timerfunc def long_runner(): for x in range(5): sleep_time = random.choice(range(1, 5)) time.sleep(sleep_time) if __name__ == "__main__": long_runner()

该例子中，咱们导入了Python标准库的random和time模块。而后咱们建立了装饰器函数。你将注意到它接收了一个函数以及包含了另外一函数。嵌套的函数在调用实际函数时将抓取时间，而后等待函数返回并抓取结束时间。如今咱们知道函数所花费的时间，咱们将它打印出来。固然该装饰器函数也须要将实际函数执行返回的结果返回回来。

接下来的函数就是包装了该计时装饰器函数。你能够看到这里使用了随机休眠，该操做只是用来模拟耗时长程序。也许你用来执行链接数据库（或者较大查询），跑线程或其余事情。

每次跑这段代码，结果都会有些不一样。本身试试！

建立计时上下文管理器

一些码农可能更喜欢使用上下文的方式来测试小段代码。咱们来构建看看：

import random import time class MyTimer: def __init__(self): self.start = time.time() def __enter__(self): return self def __exit__(self, exc_type, exc_val, exc_tb): end = time.time() runtime = end - self.start msg = 'The function took {time} seconds to complete' print(msg.format(time=runtime)) def long_runner(): for x in range(5): sleep_time = random.choice(range(1, 5)) time.sleep(sleep_time) if __name__ == "__main__": with MyTimer(): long_runner()

本例中，咱们使用__init__方法来开始咱们计时，__enter__方法不作任何事情，仅仅返回它本身。__exit__方法中抓取告终束时间，计算了总执行时间并打印。

而后咱们就使用上下文管理器来测试咱们执行的代码。

cProfile

Python内建了分析器。profile和cProfile模块。profile模块是纯Python实现，在测试过程当中可能会增长不少额外的开支。因此推荐仍是使用更快的cProfile。

咱们不会过细的讲解该模块，咱们来看看一些例子。

>>> import cProfile >>> cProfile.run("[x for x in range(1500)]") 4 function calls in 0.000 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.000 0.000 <string>:1(<listcomp>) 1 0.000 0.000 0.000 0.000 <string>:1(<module>) 1 0.000 0.000 0.000 0.000 {built-in method exec} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

咱们来了解一下内容。第一行显示了有4个函数调用。下一行告诉结果排序方式。根据文档，standard name指的是最右边的列。这里咱们有好多列数据：

ncalls: 调用了多少次
tottime: 函数所花费的总时间
percall: ncalls / tottime
cumtime: 涵盖了该函数下的全部子函数（甚至地柜）函数所话费的时间
第二个percall: cumtime / 原始钓鱼那个
filename: fileno (function) 每一个函数响应信息

你也能够像timeit模块同样在命令行执行cProfile。主要的区别就是须要传递Python脚本而不是代码做为参数。

python3 -m cProfile test.py

尝试执行下该命令，看看执行结果。

line_profiler

有个第三方提供的项目叫line_profiler，它用来分析每行代码执行时间。它也有一个叫作kenprof的脚本使用line_profiler来分析Python应用和脚本。只需使用pip来安装该包。

 pip install line_profiler

在使用line_profiler以前咱们须要一些代码来分析。不过我向先告诉你命令行中如何调用line_profiler.实际上须要执行kernprof脚原本调用line_profiler。

 kernprof -l silly_functions.py

执行完后将会打印出：Wrote profile results to silly_functions.py.lpro。该文件不能直接查看，是一个二进制文件。当咱们执行kernprof时，实际上是在__builtins__命名空间中注入LineProfiler实例。该实例称为profile，用来做为装饰器函数。让咱们来写实际脚本:

import time @profile def fast_function(): print("I'm fast function!") @profile def slow_function(): time.sleep(2) print("I'm slow function!") if __name__ == "__main__": fast_function() slow_function()

如今咱们拥有使用了profile装饰的函数。若是你直接执行该脚本将会报NameError错误，由于并无定义profile。因此记得再实际使用时移除@profile装饰器函数。

咱们来学习若是查看分析器的结果。有两种办法，第一使用line_profiler模块读结果文件：

$ python -m line_profiler silly_functions.py.lprof

还有种是在执行kernprof时带上-v参数：

$ kernprof -l -v silly_functions.py

不管哪一种办法你都将获得以下结果：

I'm fast function! I'm slow function! Wrote profile results to silly_functions.py.lprof Timer unit: 1e-06 s Total time: 4e-05 s File: silly_functions.py Function: fast_function at line 3 Line # Hits Time Per Hit % Time Line Contents ============================================================== 3 @profile 4 def fast_function(): 5 1 40 40.0 100.0 print("I'm fast function!") Total time: 2.00227 s File: silly_functions.py Function: slow_function at line 7 Line # Hits Time Per Hit % Time Line Contents ============================================================== 7 @profile 8 def slow_function(): 9 1 2002114 2002114.0 100.0 time.sleep(2) 10 1 157 157.0 0.0 print("I'm slow function!")

咱们能够看到结果中显示了每行代码的执行时间相关信息。有6列内容。咱们来看看每列表明的意思。

Line#: 分析的代码行数
Hits: 执行次数
Time: 该行代码执行时间
Per Hit: 每次执行时间
% Time: 函数中所占时间比率
Line Contents: 执行的代码

若是你用IPython，你可能想知道IPython的魔术方法%lprun能够分析制定函数，甚至是具体代码。

memory_profiler

另外一个用来分析的第三方包叫作memory_profiler.它用来分析每行代码在进程中所消耗的内存。咱们也使用pip来安装它。

pip install memory_profiler

装好后，咱们也须要代码来评测。memory_prifiler执行方式和line_profiler很是类似，它将在__builtins__注入profiler装饰器来测试函数。

@profile
def mem_func():
    lots_of_numbers = list(range(1500))
    x = ['letters'] * (5 ** 10)
    del lots_of_numbers
    return None

if __name__ == "__main__":
    mem_func()

本例中咱们建立了1500整形list。而后咱们又建立了9765625(5的10次方)字符串李彪。最后咱们删除了第一个list并返回。memory_profiler没有提供另一个脚原本让你执行该程序，而是直接经过-m参数来调用模块。

$ python -m memory_profiler memo_prof.py 
Filename: memo_prof.py

Line # Mem usage Increment Line Contents ================================================ 1 13.133 MiB 0.000 MiB @profile 2 def mem_func(): 3 13.387 MiB 0.254 MiB lots_of_numbers = list(range(1500)) 4 87.824 MiB 74.438 MiB x = ['letters'] * (5 ** 10) 5 87.938 MiB 0.113 MiB del lots_of_numbers 6 87.938 MiB 0.000 MiB return None

结果的提供的信息就很是直观了。包含代码行数和改行执行完后的内存使用，接下来是内存增长量，最后是执行的代码。

memory_profiler也提供mprof用来建立完整的内存使用报表，而不是每行结果。

$ mprof run memo_prof.py
mprof: Sampling memory every 0.1s
running as a Python program...

mprof也提供来图示来查看应用内存消耗，须要查看图表前咱们须要先安装matplotlib

$ pip install matplotlib

而后执行:

$ mprof plot

咱们获得以下图：

profilehooks

最后个要介绍的第三方包是profilehooks.它拥有一系列包装器来分期函数。安装：

pip install profilehooks

安装完后，咱们修改上次例子替换成profilehooks:

from profilehooks import profile @profile def mem_func(): lots_of_numbers = list(range(1500)) x = ['letters'] * (5 ** 10) del lots_of_numbers return None if __name__ == "__main__": mem_func()

仅仅要作的是导入profilehooks而且包装到要测试的函数中。执行后：

*** PROFILER RESULTS *** mem_func (profhooks.py:3) function called 1 times 2 function calls in 0.036 seconds Ordered by: cumulative time, internal time, call count ncalls tottime percall cumtime percall filename:lineno(function) 1 0.036 0.036 0.036 0.036 profhooks.py:3(mem_func) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 0 0.000 0.000 profile:0(profiler)

输出的结果和cProfile是同样的，你能够看看我以前对这些列的描述。profilehooks提供了两个装饰器。咱们来看看第一个timecall，它提供了函数执行时间。

from profilehooks import timecall @timecall def mem_func(): lots_of_numbers = list(range(1500)) x = ['letters'] * (5 ** 10) del lots_of_numbers return None if __name__ == "__main__": mem_func()

执行后你将看到以下输出：

  mem_func (profhooks2.py:3):
    0.056 seconds

整个包装其输出整个函数执行时间，和timeit是同样的。

另外一个profilehooks提供的装饰器是coverage。它用来打印函数每行代码涵盖数。不过我看来用处到不是很大，你能够本身试试：

*** COVERAGE RESULTS *** mem_func (profhooks3.py:3) function called 1 times @coverage def mem_func(): 1: lots_of_numbers = list(range(1500)) 1: x = ['letters'] * (5 ** 10) 1: del lots_of_numbers 1: return None

最后要告诉你的是，你也能够经过-m参数来从命令行中执行profilehooks.

python -m profilehooks mymodule.py

profilehooks是一个新包，不过我到以为应该会颇有前途。

总结

本文涵盖了须要信息。分享了如何使用Python内建模块,timeit和cProfile来测试和分期你的代码，我还学会了如何本身包装计时包装器和上下文管理器。而后咱们还学习了一些第三方包：line_profiler, memory_profiler和profilehooks。就目前，你已经学会如何分析你的代码，查找代码瓶颈。