实例的背景说明python
假定一个我的信息系统,须要记录系统中各我的的故乡、居住地、以及到过的城市。数据库设计以下:mysql
Models.py 内容以下:
sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
from
django.db
import
models
class
Province(models.Model):
name
=
models.CharField(max_length
=
10
)
def
__unicode__(
self
):
return
self
.name
class
City(models.Model):
name
=
models.CharField(max_length
=
5
)
province
=
models.ForeignKey(Province)
def
__unicode__(
self
):
return
self
.name
class
Person(models.Model):
firstname
=
models.CharField(max_length
=
10
)
lastname
=
models.CharField(max_length
=
10
)
visitation
=
models.ManyToManyField(City, related_name
=
"visitor"
)
hometown
=
models.ForeignKey(City, related_name
=
"birth"
)
living
=
models.ForeignKey(City, related_name
=
"citizen"
)
def
__unicode__(
self
):
return
self
.firstname
+
self
.lastname
|
注1:建立的app名为“QSOptimize”数据库
注2:为了简化起见,`qsoptimize_province` 表中只有2条数据:湖北省和广东省,`qsoptimize_city`表中只有三条数据:武汉市、十堰市和广州市django
prefetch_related()缓存
对于多对多字段(ManyToManyField)和一对多字段,可使用prefetch_related()来进行优化。或许你会说,没有一个叫OneToManyField的东西啊。实际上 ,ForeignKey就是一个多对一的字段,而被ForeignKey关联的字段就是一对多字段了。app
做用和方法框架
prefetch_related()和select_related()的设计目的很类似,都是为了减小SQL查询的数量,可是实现的方式不同。后者是经过JOIN语句,在SQL查询内解决问题。可是对于多对多关系,使用SQL语句解决就显得有些不太明智,由于JOIN获得的表将会很长,会致使SQL语句运行时间的增长和内存占用的增长。如有n个对象,每一个对象的多对多字段对应Mi条,就会生成Σ(n)Mi 行的结果表。数据库设计
prefetch_related()的解决方法是,分别查询每一个表,而后用Python处理他们之间的关系。继续以上边的例子进行说明,若是咱们要得到张三全部去过的城市,使用prefetch_related()应该是这么作:
函数
1
2
3
4
|
>>> zhangs
=
Person.objects.prefetch_related(
'visitation'
).get(firstname
=
u
"张"
,lastname
=
u
"三"
)
>>>
for
city
in
zhangs.visitation.
all
() :
...
print
city
...
|
上述代码触发的SQL查询以下:
1
2
3
4
5
6
7
8
9
10
|
SELECT `QSOptimize_person`.`
id
`, `QSOptimize_person`.`firstname`,
`QSOptimize_person`.`lastname`, `QSOptimize_person`.`hometown_id`, `QSOptimize_person`.`living_id`
FROM `QSOptimize_person`
WHERE (`QSOptimize_person`.`lastname`
=
'三'
AND `QSOptimize_person`.`firstname`
=
'张'
);
SELECT (`QSOptimize_person_visitation`.`person_id`) AS `_prefetch_related_val`, `QSOptimize_city`.`
id
`,
`QSOptimize_city`.`name`, `QSOptimize_city`.`province_id`
FROM `QSOptimize_city`
INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`
id
`
=
`QSOptimize_person_visitation`.`city_id`)
WHERE `QSOptimize_person_visitation`.`person_id` IN (
1
);
|
第一条SQL查询仅仅是获取张三的Person对象,第二条比较关键,它选取关系表`QSOptimize_person_visitation`中`person_id`为张三的行,而后和`city`表内联(INNER JOIN 也叫等值链接)获得结果表。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
|
id
| firstname | lastname | hometown_id | living_id |
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
|
1
| 张 | 三 |
3
|
1
|
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
1
row
in
set
(
0.00
sec)
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
| _prefetch_related_val |
id
| name | province_id |
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
|
1
|
1
| 武汉市 |
1
|
|
1
|
2
| 广州市 |
2
|
|
1
|
3
| 十堰市 |
1
|
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
3
rows
in
set
(
0.00
sec)
|
显然张三武汉、广州、十堰都去过。
又或者,咱们要得到湖北的全部城市名,能够这样:
1
2
3
4
|
>>> hb
=
Province.objects.prefetch_related(
'city_set'
).get(name__iexact
=
u
"湖北省"
)
>>>
for
city
in
hb.city_set.
all
():
... city.name
...
|
触发的SQL查询:
1
2
3
4
5
6
7
|
SELECT `QSOptimize_province`.`
id
`, `QSOptimize_province`.`name`
FROM `QSOptimize_province`
WHERE `QSOptimize_province`.`name` LIKE
'湖北省'
;
SELECT `QSOptimize_city`.`
id
`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id`
FROM `QSOptimize_city`
WHERE `QSOptimize_city`.`province_id` IN (
1
);
|
获得的表:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
|
id
| name |
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
|
1
| 湖北省 |
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
1
row
in
set
(
0.00
sec)
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
|
id
| name | province_id |
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
|
1
| 武汉市 |
1
|
|
3
| 十堰市 |
1
|
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
2
rows
in
set
(
0.00
sec)
|
咱们能够看见,prefetch使用的是 IN 语句实现的。这样,在QuerySet中的对象数量过多的时候,根据数据库特性的不一样有可能形成性能问题。
使用方法
*lookups 参数
prefetch_related()在Django < 1.7 只有这一种用法。和select_related()同样,prefetch_related()也支持深度查询,例如要得到全部姓张的人去过的省:
1
2
3
4
5
|
>>> zhangs
=
Person.objects.prefetch_related(
'visitation__province'
).
filter
(firstname__iexact
=
u
'张'
)
>>>
for
i
in
zhangs:
...
for
city
in
i.visitation.
all
():
...
print
city.province
...
|
触发的SQL:
1
2
3
4
5
6
7
8
9
10
11
12
13
|
SELECT `QSOptimize_person`.`
id
`, `QSOptimize_person`.`firstname`,
`QSOptimize_person`.`lastname`, `QSOptimize_person`.`hometown_id`, `QSOptimize_person`.`living_id`
FROM `QSOptimize_person`
WHERE `QSOptimize_person`.`firstname` LIKE
'张'
;
SELECT (`QSOptimize_person_visitation`.`person_id`) AS `_prefetch_related_val`, `QSOptimize_city`.`
id
`,
`QSOptimize_city`.`name`, `QSOptimize_city`.`province_id` FROM `QSOptimize_city`
INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`
id
`
=
`QSOptimize_person_visitation`.`city_id`)
WHERE `QSOptimize_person_visitation`.`person_id` IN (
1
,
4
);
SELECT `QSOptimize_province`.`
id
`, `QSOptimize_province`.`name`
FROM `QSOptimize_province`
WHERE `QSOptimize_province`.`
id
` IN (
1
,
2
);
|
得到的结果:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
|
id
| firstname | lastname | hometown_id | living_id |
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
|
1
| 张 | 三 |
3
|
1
|
|
4
| 张 | 六 |
2
|
2
|
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
2
rows
in
set
(
0.00
sec)
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
| _prefetch_related_val |
id
| name | province_id |
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
|
1
|
1
| 武汉市 |
1
|
|
1
|
2
| 广州市 |
2
|
|
4
|
2
| 广州市 |
2
|
|
1
|
3
| 十堰市 |
1
|
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
+
4
rows
in
set
(
0.00
sec)
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
|
id
| name |
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
|
1
| 湖北省 |
|
2
| 广东省 |
+
-
-
-
-
+
-
-
-
-
-
-
-
-
-
-
-
+
2
rows
in
set
(
0.00
sec)
|
值得一提的是,链式prefetch_related会将这些查询添加起来,就像1.7中的select_related那样。
要注意的是,在使用QuerySet的时候,一旦在链式操做中改变了数据库请求,以前用prefetch_related缓存的数据将会被忽略掉。这会致使Django从新请求数据库来得到相应的数据,从而形成性能问题。这里提到的改变数据库请求指各类filter()、exclude()等等最终会改变SQL代码的操做。而all()并不会改变最终的数据库请求,所以是不会致使从新请求数据库的。
举个例子,要获取全部人访问过的城市中带有“市”字的城市,这样作会致使大量的SQL查询:
1
2
|
plist
=
Person.objects.prefetch_related(
'visitation'
)
[p.visitation.
filter
(name__icontains
=
u
"市"
)
for
p
in
plist]
|
由于数据库中有4人,致使了2+4次SQL查询:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
|
SELECT `QSOptimize_person`.`
id
`, `QSOptimize_person`.`firstname`, `QSOptimize_person`.`lastname`,
`QSOptimize_person`.`hometown_id`, `QSOptimize_person`.`living_id`
FROM `QSOptimize_person`;
SELECT (`QSOptimize_person_visitation`.`person_id`) AS `_prefetch_related_val`, `QSOptimize_city`.`
id
`,
`QSOptimize_city`.`name`, `QSOptimize_city`.`province_id`
FROM `QSOptimize_city`
INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`
id
`
=
`QSOptimize_person_visitation`.`city_id`)
WHERE `QSOptimize_person_visitation`.`person_id` IN (
1
,
2
,
3
,
4
);
SELECT `QSOptimize_city`.`
id
`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id`
FROM `QSOptimize_city`
INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`
id
`
=
`QSOptimize_person_visitation`.`city_id`)
WHERE(`QSOptimize_person_visitation`.`person_id`
=
1
AND `QSOptimize_city`.`name` LIKE
'%市%'
);
SELECT `QSOptimize_city`.`
id
`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id`
FROM `QSOptimize_city`
INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`
id
`
=
`QSOptimize_person_visitation`.`city_id`)
WHERE (`QSOptimize_person_visitation`.`person_id`
=
2
AND `QSOptimize_city`.`name` LIKE
'%市%'
);
SELECT `QSOptimize_city`.`
id
`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id`
FROM `QSOptimize_city`
INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`
id
`
=
`QSOptimize_person_visitation`.`city_id`)
WHERE (`QSOptimize_person_visitation`.`person_id`
=
3
AND `QSOptimize_city`.`name` LIKE
'%市%'
);
SELECT `QSOptimize_city`.`
id
`, `QSOptimize_city`.`name`, `QSOptimize_city`.`province_id`
FROM `QSOptimize_city`
INNER JOIN `QSOptimize_person_visitation` ON (`QSOptimize_city`.`
id
`
=
`QSOptimize_person_visitation`.`city_id`)
WHERE (`QSOptimize_person_visitation`.`person_id`
=
4
AND `QSOptimize_city`.`name` LIKE
'%市%'
);
|
详细分析一下这些请求事件。
众所周知,QuerySet是lazy的,要用的时候才会去访问数据库。运行到第二行Python代码时,for循环将plist看作iterator,这会触发数据库查询。最初的两次SQL查询就是prefetch_related致使的。
虽然已经查询结果中包含全部所需的city的信息,但由于在循环体中对Person.visitation进行了filter操做,这显然改变了数据库请求。所以这些操做会忽略掉以前缓存到的数据,从新进行SQL查询。
可是若是有这样的需求了应该怎么办呢?在Django >= 1.7,能够经过下一节的Prefetch对象来实现,若是你的环境是Django < 1.7,能够在Python中完成这部分操做。
1
2
|
plist
=
Person.objects.prefetch_related(
'visitation'
)
[[city
for
city
in
p.visitation.
all
()
if
u
"市"
in
city.name]
for
p
in
plist]
|
Prefetch 对象
在Django >= 1.7,能够用Prefetch对象来控制prefetch_related函数的行为。
注:因为我没有安装1.7版本的Django环境,本节内容是参考Django文档写的,没有进行实际的测试。
Prefetch对象的特征:
继续上面的例子,获取全部人访问过的城市中带有“武”字和“州”的城市:
1
2
3
4
5
6
7
|
wus
=
City.objects.
filter
(name__icontains
=
u
"武"
)
zhous
=
City.objects.
filter
(name__icontains
=
u
"州"
)
plist
=
Person.objects.prefetch_related(
Prefetch(
'visitation'
, queryset
=
wus, to_attr
=
"wu_city"
),
Prefetch(
'visitation'
, queryset
=
zhous, to_attr
=
"zhou_city"
),)
[p.wu_city
for
p
in
plist]
[p.zhou_city
for
p
in
plist]
|
注:这段代码没有在实际环境中测试过,如有不正确的地方请指正。
顺带一提,Prefetch对象和字符串参数能够混用。
None
能够经过传入一个None来清空以前的prefetch_related。就像这样:
1
|
>>> prefetch_cleared_qset
=
qset.prefetch_related(
None
)
|
小结