Python3.7 dataclass 介绍

时间 2019-12-14

标签 python3.7 python dataclass 介绍栏目 Python 繁體版

原文原文链接

Posted on 2018年6月28日 by laixintao 1 Comment

Python3.7 加入了一个新的 module：dataclasses。能够简单的理解成“支持默认值、能够修改的tuple”（ “mutable namedtuples with defaults”）。其实没什么特别的，就是你定义一个很普通的类，@dataclass 装饰器能够帮你生成 __repr__ __init__ 等等方法，就不用本身写一遍了。可是此装饰器返回的依然是一个 class，这意味着并无带来任何不便，你依然可使用继承、metaclass、docstring、定义方法等。html

先展现一个 PEP 中举的例子，下面的这段代码（Python3.7）：python

1

2

3

4

5

6

7

8

9

@dataclass

class InventoryItem:

'''Class for keeping track of an item in inventory.'''

name: str

unit_price: float

quantity_on_hand: int = 0

def total_cost(self) -> float:

return self.unit_price * self.quantity_on_hand

@dataclass 会自动生成git

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

def __init__(self, name: str, unit_price: float, quantity_on_hand: int = 0) -> None:

self.name = name

self.unit_price = unit_price

self.quantity_on_hand = quantity_on_hand

def __repr__(self):

return f'InventoryItem(name={self.name!r}, unit_price={self.unit_price!r}, quantity_on_hand={self.quantity_on_hand!r})'

def __eq__(self, other):

if other.__class__ is self.__class__:

return (self.name, self.unit_price, self.quantity_on_hand) == (other.name, other.unit_price, other.quantity_on_hand)

return NotImplemented

def __ne__(self, other):

if other.__class__ is self.__class__:

return (self.name, self.unit_price, self.quantity_on_hand) != (other.name, other.unit_price, other.quantity_on_hand)

return NotImplemented

def __lt__(self, other):

if other.__class__ is self.__class__:

return (self.name, self.unit_price, self.quantity_on_hand) < (other.name, other.unit_price, other.quantity_on_hand)

return NotImplemented

def __le__(self, other):

if other.__class__ is self.__class__:

return (self.name, self.unit_price, self.quantity_on_hand) <= (other.name, other.unit_price, other.quantity_on_hand)

return NotImplemented

def __gt__(self, other):

if other.__class__ is self.__class__:

return (self.name, self.unit_price, self.quantity_on_hand) > (other.name, other.unit_price, other.quantity_on_hand)

return NotImplemented

def __ge__(self, other):

if other.__class__ is self.__class__:

return (self.name, self.unit_price, self.quantity_on_hand) >= (other.name, other.unit_price, other.quantity_on_hand)

return NotImplemented

引入dataclass的理念

Python 想简单的定义一种容器，支持经过的对象属性进行访问。在这方面已经有不少尝试了：github

标准库的 collections.namedtuple
标准库的 typing.NamedTuple
著名的 attr 库
各类 Snippet，问题和回答等

那么为何还须要 dataclass 呢？主要的好处有：缓存

没有使用 BaseClass 或者 metaclass，不会影响代码的继承关系。被装饰的类依然是一个普通的类
使用类的 Fields 类型注解，用原生的方法支持类型检查，不侵入代码，不像 attr 这种库对代码有侵入性（要用 attr 的函数将一些东西处理）

dataclass 并非要取代这些库，做为标准库的 dataclass 只是提供了一种更加方便使用的途径来定义 Data Class。以上这些库有不一样的 feature，依然有存在的意义。app

基本用法

dataclasses 的 dataclass 装饰器的原型以下：ide

1	def dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False)

很明显，这些默认参数能够控制是否生成魔术方法。经过本文开头的例子能够看出，不用加括号也能够调用。函数

经过 field 能够对参数作更多的定制化，好比默认值、是否参与repr、是否参与hash等。好比文档中的这个例子，因为 mylist 的缺失，就调用了 default_factory 。更多 field 能作的事情参考文档吧。post

1

2

3

4

5

6

@dataclass

class C:

mylist: List[int] = field(default_factory=list)

c = C()

c.mylist += [1, 2, 3]

此外，dataclasses 模块还提供了不少有用的函数，能够将 dataclass 转换成 tuple、dict 等形式。话说我本身重复过不少这样的方法了……性能

1

2

3

4

5

6

7

8

9

10

11

12

13

14

@dataclass

class Point:

x: int

y: int

@dataclass

class C:

mylist: List[Point]

p = Point(10, 20)

assert asdict(p) == {'x': 10, 'y': 20}

c = C([Point(0, 0), Point(10, 4)])

assert asdict(c) == {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}

Hook init

自动生成的 __init__ 能够被 hook。很简单，自动生成的 __init__ 方法会调用 __post_init__

1

2

3

4

5

6

7

8

@dataclass

class C:

a: float

b: float

c: float = field(init=False)

def __post_init__(self):

self.c = self.a + self.b

若是想传给 __post_init__ 方法可是不传给 __init__ ，可使用一个特殊的类型 InitVar

1

2

3

4

5

6

7

8

9

10

11

@dataclass

class C:

i: int

j: int = None

database: InitVar[DatabaseType] = None

def __post_init__(self, database):

if self.j is None and database is not None:

self.j = database.lookup('j')

c = C(10, database=my_database)

不可修改的功能

Python 没有 const 相似的东西，理论上任何东西都是能够修改的。若是非要说不能修改的实现呢，这里有个比较著名的实现。只有不到10行代码。

可是有了 dataclass ，能够直接使用 @dataclass(frozen=True) 了。而后装饰器会对 Class 添加上 __setattr__ 和 __delattr__ 。Raise 一个 FrozenInstanceError。缺点是会有一些性能损失，由于 __init__ 必须经过 object.__setattr__ 。

继承

对于有继承关系的 dataclass，会按照 MRO 的反顺序（从object开始），对于每个基类，将在基类找到的 fields 添加到顺序的一个 mapping 中。全部的基类都找完了，按照这个 mapping 生成全部的魔术方法。因此方法中这些参数的顺序，是按照找到的顺序排的，先找到的排在前面。由于是先找的基类，因此相同 name 的话，后面子类的 fields 定义会覆盖基类的。好比文档中的这个例子：

1

2

3

4

5

6

7

8

9

@dataclass

class Base:

x: Any = 15.0

y: int = 0

@dataclass

class C(Base):

z: int = 10

x: int = 15

那么最后生成的将会是：

1	def __init__(self, x: int = 15, y: int = 0, z: int = 10):

注意 x y 的顺序是 Base 中的顺序，可是 C 的 x 是 int 类型，覆盖了 Base 中的 Any。

可变对象的陷阱

在前面的“基本用法”一节中，使用了 default_factory 。为何不直接使用 [] 做为默认呢？

老鸟都会知道 Python 这么一个坑：将可变对象好比 list 做为函数的默认参数，那么这个参数会被缓存，致使意外的错误。详细的能够参考这里：Python Common Gotchas。

考虑到下面的代码：

1

2

3

4

5

@dataclass

class D:

x: List = []

def add(self, element):

self.x += element

将会生成：

1

2

3

4

5

6

7

8

class D:

x = []

def __init__(self, x=x):

self.x = x

def add(self, element):

self.x += element

assert D().x is D().x

这样不管实例化多少对象，x 变量将在多个实例之间共享。dataclass 很难有一个比较好的办法预防这种状况。因此这个地方作的设计是：若是默认参数的类型是 list dict 或 set ，就抛出一个 TypeError。虽然不算完美，可是能够预防很大一部分状况了。

若是默认参数须要是 list，那么就用上面提到的 default_factory 。

相关文章

相关标签/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。

最新文章

本站公众号

欢迎关注本站公众号,获取更多信息

相关文章

>>更多相关文章<<