[Python3]踩坑实录-优化技巧1

时间 2020-06-14

原文原文链接

选择合适的数据结构html

考虑不一样的应用场景，应选择不一样的数据结构
好比在查找多于插入的场景中，考虑字典Dict是否是更适合;
由于在Python3中, 字典Dict 经过hash把key映射到hash table的不一样位置(或者说不一样的bucket中)，
所以查找操做的复杂度为 O(1)；python
而列表list对象实际是个数组，完成相同的查找须要遍历整个list，其复杂度为 O(n)，
所以对成员的查找访问等操做字典要比 list 更快。数组
集合Set 跟字典Dict比较相似，查找操做的复杂度为 O(1)，由于其本质是一个建和值相同的dict,
不一样点在于比较和插入的时候须要两步比较，第一步经过__hash__方法比较，不相同则写入，
若是是相同则进行第二步__eq__方法判断，若是还相同则丢弃，若是不一样则写入。
这也是为何下面的结果中set会比dict慢一点的缘由。数据结构

import string
import time
import random

if __name__ == '__main__':
    # generate a list containing a-z, 26 characters
    # 生成包含26个字母 的三种存储对象
    array = [i for i in string.ascii_lowercase]  # ['a', 'b', 'c', 'd', 'e', 'f'....
    dictionary = dict.fromkeys(array, 1)  # {'a': 1, 'b': 1, 'c': 1, 'd': 1....
    bag = {i for i in string.ascii_lowercase}  # {'q', 'v', 'u', 'y', 'z'...

    # set random seed
    random.seed(666)

    # generate test data which contains some characters in alphabet and some special symbol
    # 固定随机种子，生成10000000个随机数据， 一些事字母 一些特殊字符
    test_data = random.choices([chr(i) for i in range(0, 123)], k=10000000)
    count1, count2, count3 = 0, 0, 0
    start = time.time()

    # 若是是字母 结果加一
    for val in test_data:
        count1 = count1 + 1 if val in array else count1

    print(count1)
    print("when using List, Execution Time: %.6f s." % (time.time() - start))  # 4.470003 s.
    start = time.time()

    for val in test_data:
        count2 = count2 + 1 if val in dictionary else count2

    print(count2)
    print("when using Dict Execution Time: %.6f s." % (time.time() - start))  # 1.020261 s.
    start = time.time()

    for val in test_data:
        count3 = count3 + 1 if val in bag else count3

    print(count3)
    print("when using Set Execution Time: %.6f s." % (time.time() - start))  # 1.045259 s.

对循环的优化app

基本原则是减小循环的次数和循环内的计算量；此外除了逻辑层面的优化以外，
还要在代码实现上下功夫。尽可能使用列表解析（list comprehension），生成器(generator),
还有map,reduce操做; 而不是全员for循环dom

import time
import random

if __name__ == '__main__':

    # set random seed
    random.seed(666)
    start = time.time()

    length = 1000000
    # generate test data which contains some characters in alphabet and some special symbol
    # 固定随机种子，生成10000000个随机数据， 一些事字母 一些特殊字符
    list_exp_result = [chr(random.randint(0, 123)) for _ in range(length)]

    print(len(list_exp_result))
    print("when using list comprehension, Execution Time: %.6f s." % (time.time() - start))  # 1.195765 s.
    start = time.time()

    for_exp_result = list()
    for _ in range(length):
        for_exp_result.append(chr(random.randint(0, 123)))

    print(len(for_exp_result))
    print("when using normal for loop, Execution Time: %.6f s." % (time.time() - start))  # 1.306519 s.
    start = time.time()

    map_exp_result = list(map(lambda v: random.randint(0, 123), range(length)))
    print(len(map_exp_result))
    print("when using map task, Execution Time: %.6f s." % (time.time() - start))  # 1.153902 s.

更多详细探究，请移步[Python3]为何map比for循环快函数

其余零碎小技巧oop
- 使用局部变量，避免"global" 关键字
- if done is not None 比语句 if done != None 更快
- 使用级联比较 "x < y < z" 而不是 "x < y and y < z"
- while 1 要比 while True 更快
- build in 函数一般较快，add(a,b) 要快于 a + b
- 复制列表时，使用：new_list = list(old_list)