python字符串的encode与decode

时间 2020-05-24

标签 python 字符串 encode decode 栏目 Python 繁體版

原文原文链接

#综述：python中字符串分为字节字符和非字节字符 ##python3 python3中默认输入字符串以非字节字符编码，使用unicode字符集表示，能够使用encode方法转化为ascii，utf-8, utf-16等各类编码形式的字节字符；所以仅非字节字符才被python3认为是标准字符串python

Python 3.5.2 (default, Nov 23 2017, 16:37:01)
    [GCC 5.4.0 20160609] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> uni_str = 'abc'
    >>> type(uni_str)
    <class 'str'>
    >>> utf8_str = uni_str.encode('utf-8')
    >>> type(utf8_str)
    <class 'bytes'>
    >>> asc_str = uni_str.encode('utf-8')
    >>> type(asc_str)
    <class 'bytes'>
    >>> uni_str
    'abc'
    >>> utf8_str
    b'abc'
    >>> asc
    asc_str  ascii(   
    >>> asc_str
    b'abc'

##python2 python2中输入字符串默认使用ascii编码的字节字符，所以默认不支持中文（存疑），能够使用decode方法将默认字节编码的字符串转化为非字节字符，使用unicode字符集表示，进而使用encode方法将unicode字符集的非字节字符转化为其余编码形式的字符如utf-8， utf-16；所以编码后字符串，即字节字符才被python2认为是字符串格式linux

Python 2.7.12 (default, Dec  4 2017, 14:50:18)
    [GCC 5.4.0 20160609] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> str = 'abc'
    >>> type(str)
    <type 'str'>
    >>> uni_str = str.decode('ascii')
    >>> uni_str
    u'abc'
    >>> type(uni_str)
    <type 'unicode'>
    >>> utf8_str = uni_str.encode('utf-8')
    >>> utf8_str
    'abc'
    >>> type(utf8_str)
    <type 'str'>