Python内存管理机制

时间 2019-11-10

原文原文链接

Python的内存管理机制：引入计数、垃圾回收、内存池机制java

1、引入计数python

一、变量与对象git

In sum, variables are created when assigned, can reference any type of object, and must
be assigned before they are referenced. This means that you never need to declare names
used by your script, but you must initialize names before you can update them; counters,
for example, must be initialized to zero before you can add to them.

变量赋值的时候才建立，它能够指向（引用）任何类型的对象
- python里每个东西都是对象，它们的核心就是一个结构体：PyObject
变量必须先赋值，再引用。
- 好比，你定义一个计数器，你必须初始化成0，而后才能自增。
每一个对象都包含两个头部字段（类型标识符和引用计数器）

关系图以下：github

　Names and objects after running the assignment a = 3. Variable a becomes a reference to
the object 3. Internally, the variable is really a pointer to the object’s memory space created by running
the literal expression 3.express

These links from variables to objects are called references in Python—that is, a reference
is a kind of association, implemented as a pointer in memory.1 Whenever the variables
are later used (i.e., referenced), Python automatically follows the variable-to-object
links. This is all simpler than the terminology may imply. In concrete terms:

Variables are entries in a system table, with spaces for links to objects.
Objects are pieces of allocated memory, with enough space to represent the values for which they stand.

References are automatically followed pointers from variables to objects.
objects have two header fields, a type designator and a reference counter.

缓存

In Python, things work more simply.
Names have no types; as stated earlier, types live with objects, not names. In the preceding
listing, we’ve simply changed a to reference different objects. Because variables
have no type, we haven’t actually changed the type of the variable a; we’ve simply made
the variable reference a different type of object. In fact, again, all we can ever say about
a variable in Python is that it references a particular object at a particular point in time.

　　变量名没有类型，类型属于对象（由于变量引用对象，因此类型随对象），在Python中，变量是一种特定类型对象在一个特定的时间点的引用。app

二、共享引用ide

>>> a = 3
>>> b = a
>>>
>>> id(a)
1747479616
>>> id(b)
1747479616
>>>
>>> hex(id(a))
'0x68286c40'
>>> hex(id(b))
'0x68286c40'
>>>

This scenario in Python—with multiple names referencing the same object—is usually
called a shared reference (and sometimes just a shared object). Note that the names a
and b are not linked to each other directly when this happens; in fact, there is no way
to ever link a variable to another variable in Python. 
Rather, both variables point to the same object via their references.

一、id() 是 python 的内置函数，用于返回对象的标识，即对象的内存地址。函数

>>> help(id)
Help on built-in function id in module builtins:

id(obj, /)
    Return the identity of an object.
    
    This is guaranteed to be unique among simultaneously existing objects.
    (CPython uses the object's memory address.)

二、引用所指判断性能

　　经过is进行引用所指判断，is是用来判断两个引用所指的对象是否相同。

整数

>>> a = 256
>>> b = 256
>>> a is b
True
>>> c = 257
>>> d = 257
>>> c is d
False
>>>

短字符串

>>> e = "Explicit"
>>> f = "Explicit"
>>> e is f
True
>>>

长字符串

>>> g = "Beautiful is better"
>>> h = "Beautiful is better"
>>> g is h
False
>>>

列表

>>> lst1 = [1, 2, 3]
>>> lst2 = [1, 2, 3]
>>> lst1 is lst2
False
>>>

由运行结果可知：

　　一、Python缓存了整数和短字符串，所以每一个对象在内存中只存有一份，引用所指对象就是相同的，即便使用赋值

　　　　语句，也只是创造新的引用，而不是对象自己；

　　二、Python没有缓存长字符串、列表及其余对象，能够由多个相同的对象，可使用赋值语句建立出新的对象。

原理：

# 两种优化机制： 代码块内的缓存机制, 小数据池。

# 代码块
代码全都是基于代码块去运行的（比如校长给一个班发布命令），一个文件就是一个代码块。
不一样的文件就是不一样的代码块。

# 代码块内的缓存机制
Python在执行同一个代码块的初始化对象的命令时，会检查是否其值是否已经存在，若是存在，会将其重用。
换句话说：执行同一个代码块时，遇到初始化对象的命令时，他会将初始化的这个变量与值存储在一个字典中，
在遇到新的变量时，会先在字典中查询记录，
若是有一样的记录那么它会重复使用这个字典中的以前的这个值。
因此在文件执行时（同一个代码块）会把两个变量指向同一个对象，
知足缓存机制则他们在内存中只存在一个，即：id相同。

注意：
# 机制只是在同一个代码块下！！！，才实行。
# 知足此机制的数据类型：int str bool。


# 小数据池（驻留机制，驻村机制，字符串的驻存机制，字符串的缓存机制等等）
不一样代码块之间的优化。
# 适应的数据类型：str bool int
int： -5 ~256
str: 必定条件下的str知足小数据池。
bool值 所有。


# 总结：
若是你在同一个代码块中，用同一个代码块中的缓存机制。
若是你在不一样代码块中，用小数据池。

# 优势：
1，节省内存。
2，提高性能。

　　github上有详细的例子，wtfpython

三、查看对象的引用计数

　　在Python中，每一个对象都有指向该对象的引用总数 --- 引用计数

　　查看对象的引用计数：sys.getrefcount()

　当对变量从新赋值时，它原来引用的值去哪啦？好比下面的例子，给 s 从新赋值字符串 apple，6 跑哪里去啦？

>>> s = 6
>>> s = 'apple'

答案是：当变量从新赋值时，它原来指向的对象（若是没有被其余变量或对象引用的话）的空间可能被收回（垃圾回收）

The answer is that in Python, whenever a name is assigned to a new object, the space
held by the prior object is reclaimed if it is not referenced by any other name or object.
This automatic reclamation of objects’ space is known as garbage collection, and makes
life much simpler for programmers of languages like Python that support it.

普通引用

>>> import sys
>>> 
>>> a = "simple"
>>> sys.getrefcount(a)
2
>>> b = a
>>> sys.getrefcount(a)
3
>>> sys.getrefcount(b)
3
>>>

　　注意：当使用某个引用做为参数，传递给getrefcount()时，参数实际上建立了一个临时的引用。所以，getrefcount()所获得的结果，会比指望的多1。

3、垃圾回收

　　当Python中的对象愈来愈多，占据愈来愈大的内存，启动垃圾回收(garbage collection)，将没用的对象清除。

一、原理

　　当Python的某个对象的引用计数降为0时，说明没有任何引用指向该对象，该对象就成为要被回收的垃圾。

好比某个新建对象，被分配给某个引用，对象的引用计数变为1。若是引用被删除，对象的引用计数为0，那么该对象就能够被垃圾回收。

Internally, Python accomplishes this feat by keeping a counter in every object that keeps
track of the number of references currently pointing to that object. As soon as (and
exactly when) this counter drops to zero, the object’s memory space is automatically
reclaimed. In the preceding listing, we’re assuming that each time x is assigned to a new
object, the prior object’s reference counter drops to zero, causing it to be reclaimed.

The most immediately tangible benefit of garbage collection is that it means you can
use objects liberally without ever needing to allocate or free up space in your script.
Python will clean up unused space for you as your program runs. In practice, this
eliminates a substantial amount of bookkeeping code required in lower-level languages
such as C and C++.

二、解析del

　　del 可使对象的引用计数减 1，该表引用计数变为0，用户不可能经过任何方式接触或者动用这个对象，当垃圾回收启动时，Python扫描到这个引用计数为0的对象，就将它所占据的内存清空。

注意

　　一、垃圾回收时，Python不能进行其它的任务，频繁的垃圾回收将大大下降Python的工做效率；

　　二、Python只会在特定条件下，自动启动垃圾回收（垃圾对象少就不必回收）

　　三、当Python运行时，会记录其中分配对象(object allocation)和取消分配对象(object deallocation)的次数。

　　当二者的差值高于某个阈值时，垃圾回收才会启动。

>>> import gc
>>> 
>>> gc.get_threshold() #gc模块中查看垃圾回收阈值的方法
(700, 10, 10)
>>>

阈值分析：

　　700 便是垃圾回收启动的阈值；

　　每10 次 0代垃圾回收，会配合 1次 1代的垃圾回收；而每10次1代的垃圾回收，才会有1次的2代垃圾回收；

固然也是能够手动启动垃圾回收：

>>> gc.collect()       #手动启动垃圾回收
52
>>> gc.set_threshold(666, 8, 9) # gc模块中设置垃圾回收阈值的方法
>>>

何为分代回收

Python将全部的对象分为0，1，2三代；
全部的新建对象都是0代对象；
当某一代对象经历过垃圾回收，依然存活，就被纳入下一代对象。

分代技术是一种典型的以空间换时间的技术，这也正是java里的关键技术。这种思想简单点说就是：对象存在时间越长，越可能不是垃圾，应该越少去收集。
这样的思想，能够减小标记-清除机制所带来的额外操做。分代就是将回收对象分红数个代，每一个代就是一个链表（集合），代进行标记-清除的时间与代内对象
存活时间成正比例关系。
从上面代码能够看出python里一共有三代，每一个代的threshold值表示该代最多容纳对象的个数。默认状况下，当0代超过700,或1，2代超过10，垃圾回收机制将触发。
0代触发将清理全部三代，1代触发会清理1,2代，2代触发后只会清理本身。

标记-清除

标记-清除机制，顾名思义，首先标记对象（垃圾检测），而后清除垃圾（垃圾回收）。
首先初始全部对象标记为白色，并肯定根节点对象（这些对象是不会被删除），标记它们为黑色（表示对象有效）。
将有效对象引用的对象标记为灰色（表示对象可达，但它们所引用的对象还没检查），检查完灰色对象引用的对象后，将灰色标记为黑色。
重复直到不存在灰色节点为止。最后白色结点都是须要清除的对象。

如何解决循环引用可能致使的内存泄露问题呢？

More on Python Garbage Collection

Technically speaking, Python’s garbage collection is based mainly upon reference counters,
as described here; however, it also has a component that detects and reclaims
objects with cyclic references in time. This component can be disabled if you’re sure
that your code doesn’t create cycles, but it is enabled by default.

Circular references are a classic issue in reference count garbage collectors. Because
references are implemented as pointers, it’s possible for an object to reference itself, or
reference another object that does. For example, exercise 3 at the end of Part I and its
solution in Appendix D show how to create a cycle easily by embedding a reference to
a list within itself (e.g., L.append(L)). The same phenomenon can occur for assignments
to attributes of objects created from user-defined classes. Though relatively rare, because
the reference counts for such objects never drop to zero, they must be treated
specially.

For more details on Python’s cycle detector, see the documentation for the gc module
in Python’s library manual. The best news here is that garbage-collection-based memory
management is implemented for you in Python, by people highly skilled at the task.

　　答案是：

弱引用使用weakref 模块下的 ref 方法
强制把其中一个引用变成 None

import gc
import objgraph
import sys
import weakref


def quote_demo():
    class Person:
        pass

    p = Person()  # 1
    print(sys.getrefcount(p))  # 2  first

    def log(obj):
        # 4  second 函数执行才计数，执行完释放
        print(sys.getrefcount(obj))

    log(p)  # 3

    p2 = p  # 2
    print(sys.getrefcount(p))  # 3
    del p2
    print(sys.getrefcount(p))  # 3 - 1 = 2


def circle_quote():
    # 循环引用
    class Dog:
        pass

    class Person:
        pass

    p = Person()
    d = Dog()

    print(objgraph.count("Person"))
    print(objgraph.count("Dog"))

    p.pet = d
    d.master = p

    # 删除 p, d以后, 对应的对象是否被释放掉
    del p
    del d

    print(objgraph.count("Person"))
    print(objgraph.count("Dog"))


def solve_cirecle_quote():
    # 1. 定义了两个类
    class Person:
        def __del__(self):
            print("Person对象, 被释放了")

        pass

    class Dog:
        def __del__(self):
            print("Dog对象, 被释放了")

        pass

    p = Person()
    d = Dog()

    p.pet = d
    d.master = p

    p.pet = None  # 强制置 None
    del p
    del d

    gc.collect()

    print(objgraph.count("Person"))
    print(objgraph.count("Dog"))


def sovle_circle_quote_with_weak_ref():
    # 1. 定义了两个类
    class Person:
        def __del__(self):
            print("Person对象, 被释放了")

        pass

    class Dog:
        def __del__(self):
            print("Dog对象, 被释放了")

        pass

    p = Person()
    d = Dog()

    p.pet = d
    d.master = weakref.ref(p)

    del p
    del d

    gc.collect()

    print(objgraph.count("Person"))
    print(objgraph.count("Dog"))


if __name__ == "__main__":
    quote_demo()
    circle_quote()
    solve_cirecle_quote()
    sovle_circle_quote_with_weak_ref()

4、内存池机制

　　Python中有分为大内存和小内存：（256K为界限分大小内存）

大内存使用malloc进行分配
小内存使用内存池进行分配
Python的内存池(金字塔)

　　第+3层：最上层，用户对Python对象的直接操做

　　第+1层和第+2层：内存池，有Python的接口函数PyMem_Malloc实现

- 若请求分配的内存在1~256字节之间就使用内存池管理系统进行分配，调用malloc函数分配内存，
- 可是每次只会分配一块大小为256K的大块内存，不会调用free函数释放内存，将该内存块留在内存池中以便下次使用

　　第0层：大内存 -----> 若请求分配的内存大于256K，malloc函数分配内存，free函数释放内存。

　　第-1，-2层：操做系统进行操做