有效使用Django的QuerySets

时间 2019-12-11

原文原文链接

对象关系映射 (ORM) 使得与SQL数据库交互更为简单，不过也被认为效率不高，比原始的SQL要慢。python

　　要有效的使用ORM，意味着须要多少要明白它是如何查询数据库的。本文我将重点介绍如何有效使用 Django ORM系统访问中到大型的数据集。web

　Django的queryset是惰性的

　　Django的queryset对应于数据库的若干记录（row），经过可选的查询来过滤。例如，下面的代码会获得数据库中名字为‘Dave’的全部的人:数据库

1	`person_set` `=` `Person.objects.` `filter` `(first_name` `=` `"Dave"` `)`

　　上面的代码并无运行任何的数据库查询。你可使用person_set，给它加上一些过滤条件，或者将它传给某个函数，这些操做都不会发送给数据库。这是对的，由于数据库查询是显著影响web应用性能的因素之一。django

　　要真正从数据库得到数据，你须要遍历queryset:缓存

1 2	`for` `person` `in` `person_set:` `print` `(person.last_name)`

　　当你遍历queryset时，全部匹配的记录会从数据库获取，而后转换成Django的model。这被称为执行（evaluation）。这些model会保存在queryset内置的cache中，这样若是你再次遍历这个queryset，你不须要重复运行通用的查询。函数

　　例如，下面的代码只会执行一次数据库查询：性能

1

2

3

4

5

6

7

 
        pet_set  
        = 
        Pet.objects. 
        filter 
        (species 
        = 
        "Dog" 
        ) 
       

 
        # The query is executed and cached. 
       

 
        for 
        pet  
        in 
        pet_set: 
       

 
        print 
        (pet.first_name)

 
        # The cache is used for subsequent iteration. 
       

 
        for 
        pet  
        in 
        pet_set: 
       

 
        print 
        (pet.last_name)

　　queryset的cache最有用的地方是能够有效的测试queryset是否包含数据，只有有数据时才会去遍历：测试

1

2

3

4

5

6

 
        restaurant_set  
        = 
        Restaurant.objects. 
        filter 
        (cuisine 
        = 
        "Indian" 
        ) 
       

 
        # `if`语句会触发queryset的执行。 
       

 
        if 
        restaurant_set:

 
        # 遍历时用的是cache中的数据

 
        for 
        restaurant  
        in 
        restaurant_set:

 
        print 
        (restaurant.name)

　　有时候，你也许只想知道是否有数据存在，而不须要遍历全部的数据。这种状况，简单的使用if语句进行判断也会彻底执行整个queryset而且把数据放入cache，虽然你并不须要这些数据！fetch

1

2

3

4

5

 
        city_set  
        = 
        City.objects. 
        filter 
        (name 
        = 
        "Cambridge" 
        ) 
       

 
        # `if`语句会执行queryset.。 
       

 
        if 
        city_set:

 
        # 咱们并不须要全部的数据，可是ORM仍然会获取全部记录！

 
        print 
        ( 
        "At least one city called Cambridge still stands!" 
        )

　　为了不这个，能够用exists()方法来检查是否有数据：优化

1

2

3

4

5

 
        tree_set  
        = 
        Tree.objects. 
        filter 
        ( 
        type 
        = 
        "deciduous" 
        ) 
       

 
        # `exists()`的检查能够避免数据放入queryset的cache。 
       

 
        if 
        tree_set.exists():

 
        # 没有数据从数据库获取，从而节省了带宽和内存

 
        print 
        ( 
        "There are still hardwood trees in the world!" 
        )

　　处理成千上万的记录时，将它们一次装入内存是很浪费的。更糟糕的是，巨大的queryset可能会锁住系统进程，让你的程序濒临崩溃。

　　要避免在遍历数据的同时产生queryset cache，可使用iterator()方法来获取数据，处理完数据就将其丢弃。

1

2

3

4

 
        star_set  
        = 
        Star.objects. 
        all 
        () 
       

 
        # `iterator()`能够一次只从数据库获取少许数据，这样能够节省内存 
       

 
        for 
        star  
        in 
        star_set.iterator(): 
       

 
        print 
        (star.name)

　　固然，使用iterator()方法来防止生成cache，意味着遍历同一个queryset时会重复执行查询。因此使用iterator()的时候要小心，确保你的代码在操做一个大的queryset时没有重复执行查询

　　如前所述，查询集缓存对于组合 if 语句和 for 语句是很强大的，它容许在一个查询集上进行有条件的循环。然而对于很大的查询集，则不适合使用查询集缓存。

　　最简单的解决方案是结合使用exists()和iterator(), 经过使用两次数据库查询来避免使用查询集缓存。

1

2

3

4

5

6

 
        molecule_set  
        = 
        Molecule.objects. 
        all 
        () 
       

 
        # One database query to test if any rows exist. 
       

 
        if 
        molecule_set.exists():

 
        # Another database query to start fetching the rows in batches.

 
        for 
        molecule  
        in 
        molecule_set.iterator():

 
        print 
        (molecule.velocity)

　　一个更复杂点的方案是使用 Python 的“ 高级迭代方法 ”在开始循环前先查看一下 iterator() 的第一个元素再决定是否进行循环。

 
        atom_set  
        = 
        Atom.objects. 
        all 
        () 
       

 
        # One database query to start fetching the rows in batches. 
       

 
        atom_iterator  
        = 
        atom_set.iterator() 
       

 
        # Peek at the first item in the iterator. 
       

 
        try 
        :

 
        first_atom  
        = 
        next 
        (atom_iterator)

 
        except 
        StopIteration:

 
        # No rows were found, so do nothing.

 
        pass

 
        else 
        :

 
        # At least one row was found, so iterate over

 
        # all the rows, including the first one.

 
        from 
        itertools  
        import 
        chain

 
        for 
        atom  
        in 
        chain([first_atom], atom_set):

 
        print 
        (atom.mass)

　　queryset的cache是用于减小程序对数据库的查询，在一般的使用下会保证只有在须要的时候才会查询数据库。

　　使用exists()和iterator()方法能够优化程序对内存的使用。不过，因为它们并不会生成queryset cache，可能会形成额外的数据库查询。

　　因此编码时须要注意一下，若是程序开始变慢，你须要看看代码的瓶颈在哪里，是否会有一些小的优化能够帮到你。