【Python】0八、字符串格式化、编码及bytes、bytearray

时间 2019-11-06

标签 Python 字符串格式化编码 bytes bytearray 栏目 Python 繁體版

原文原文链接

1、字符串格式化python

一、字符串格式化bash

字符串格式化是拼接字符串的一种手段app

此前学过str.join()和+来拼接str，但难以控制格式ide

str格式化有另种方式printf style 和str.format()ui

二、printf style编码

从c语言继承过来的spa

In [2]: s = 'i love %s'

待格式化的字符串，当一个str存在占位符的时候；3d

占位符：%加一个格式控制符code

In [3]: s
Out[3]: 'i love %s'

In [4]: s %('python',)
Out[4]: 'i love python'

In [5]: s %('python')
Out[5]: 'i love python'

In [6]: s %'python'
Out[6]: 'i love python'

In [7]: 'i love %s' %'python'
Out[7]: 'i love python'

传入参数顺序的替换占位符，返回替换后的str，原str不变orm

In [9]: 'i love %s, i am %d' % ('python', 18)
Out[9]: 'i love python, i am 18'

In [11]: 'i love %s, i am %d' % ('python' 18)
  File "<ipython-input-11-b6c40f507b33>", line 1
    'i love %s, i am %d' % ('python' 18)
                                      ^
SyntaxError: invalid syntax

In [13]: 'i love %s, i am %d' % ('python',)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-b0f8a99953ee> in <module>()
----> 1 'i love %s, i am %d' % ('python',)

TypeError: not enough arguments for format string

In [14]: 'i love %s, i am %d' % ('python',"xxj")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-9819bcbd229f> in <module>()
----> 1 'i love %s, i am %d' % ('python',"xxj")

TypeError: %d format: a number is required, not str

当占位符个数和参数个数不匹配的时候，会抛出TypeError

In [25]: 'i love %s, i am %d' % ('python', 18)
Out[25]: 'i love python, i am 18'

In [26]: 'i love %s, i am %d' % ('python', "xxj")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-26-448f3d60565b> in <module>()
----> 1 'i love %s, i am %d' % ('python', "xxj")

TypeError: %d format: a number is required, not str

In [27]: 'i love %s, i am %d' % (18, 18)   # 为何这里类型不对，但没报错
Out[27]: 'i love 18, i am 18'

In [28]: 'i love %s, i am %d' % ([1, 2], 18)
Out[28]: 'i love [1, 2], i am 18'

In [29]: 'i love %s, i am %d' % (1, 2, 18)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-29-de8412982695> in <module>()
----> 1 'i love %s, i am %d' % (1, 2, 18)

TypeError: not all arguments converted during string formatting

In [30]: 'i love %s, i am %d' % ((1, 2), 18)
Out[30]: 'i love (1, 2), i am 18'

%s：表示str或任意对象，或隐式的调用str()将对象转化成str

语法格式：

print “String %format1 %format2 …” %(variable1, varialbe2, …)

%后面format前面可使用的修饰符，(若是有，则只能按以下顺序)：

%[(name)][flags][width][.precision]typecode

typecode就是上面的format和图中的字符

位于括号中的一个属于后面的字典的键名，用于选出一个具体项

flags是下面标志中的一个或多个：

-：表示左对齐，默认为右对齐

+：表示包含数字符号，正数也会带“+”

0：表示一个零填充

width指定最小宽度的数字

.用于按照精度分割字段的宽度

precision指定要打印字符串中的最大字符个数，浮点数中小数点以后的位数，或者整数的最小位数（前面补0）；

## %s

In [34]: 'I love %s, i am %d' % ("python", 18)
Out[34]: 'I love python, i am 18'

In [35]: 'I love %-s, i am %d' % ("python", 18)
Out[35]: 'I love python, i am 18'

In [36]: 'I love %-30s, i am %d' % ("python", 18)
Out[36]: 'I love python                        , i am 18'

In [37]: 'I love %30s, i am %d' % ("python", 18)
Out[37]: 'I love                         python, i am 18'

In [38]: 'I love %030s, i am %d' % ("python", 18)
Out[38]: 'I love                         python, i am 18'

In [39]: 'I love %-030s, i am %d' % ("python", 18)
Out[39]: 'I love python                        , i am 18'

In [40]: 'I love %-030.5s, i am %d' % ("python", 18)
Out[40]: 'I love pytho                         , i am 18'


## %d

In [49]: 'I love %s, i am %d' % ("python", 18)
Out[49]: 'I love python, i am 18'

In [50]: 'I love %s, i am %20d' % ("python", 18)
Out[50]: 'I love python, i am                   18'

In [51]: 'I love %s, i am %020d' % ("python", 18)
Out[51]: 'I love python, i am 00000000000000000018'

In [52]: 'I love %s, i am %-20d' % ("python", 18)
Out[52]: 'I love python, i am 18   

In [53]: 'I love %s, i am %-20.5d' % ("python", 18)
Out[53]: 'I love python, i am 00018               '

In [54]: 'I love %s, i am %-20.6d' % ("python", 18)
Out[54]: 'I love python, i am 000018  


## %f

In [43]: 'I love %s, i am %f' % ("python", 18)
Out[43]: 'I love python, i am 18.000000'

In [44]: 'I love %s, i am %20f' % ("python", 18)
Out[44]: 'I love python, i am            18.000000'

In [45]: 'I love %s, i am %020f' % ("python", 18)
Out[45]: 'I love python, i am 0000000000018.000000'

In [46]: 'I love %s, i am %-020f' % ("python", 18)
Out[46]: 'I love python, i am 18.000000           '

In [47]: 'I love %s, i am %-020.5f' % ("python", 18)
Out[47]: 'I love python, i am 18.00000            '

In [48]: 'I love %s, i am %-020.5d' % ("python", 18)
Out[48]: 'I love python, i am 00018

printf style 格式化对其它语言，尤为是c语言转过来的人，很是容易接受；但并非Python建议使用的方法。

三、str.format()

In [67]: 'I love {}'.format('python')
Out[67]: 'I love python'

str.format()使用大括号做为占位符

当调用str.format()方法，format()传入的参数会替换大括号

In [68]: 'I love {}, i am {}'.format('python', 18)  
Out[68]: 'I love python, i am 18'

format()的参数个数是可变的

In [70]: 'I love {}, i am {}'.format('python', 18)
Out[70]: 'I love python, i am 18'

In [71]: 'I love {}, i am {}'.format(18, 'python')
Out[71]: 'I love 18, i am python'

In [72]: 'I love {1}, i am {0}'.format(18, 'python')
Out[72]: 'I love python, i am 18'

In [73]: 'I love {1}, i am {1}'.format(18, 'python')
Out[73]: 'I love python, i am python'

In [74]: 'I love {1}, i am {1}'.format(18)       # 占位符中的数字指定的位置参数须要存在
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-74-9777ef27de22> in <module>()
----> 1 'I love {1}, i am {1}'.format(18)

IndexError: tuple index out of range

In [75]: 'I love {0}, i am {1}'.format(18)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-75-1381c3981335> in <module>()
----> 1 'I love {0}, i am {1}'.format(18)

IndexError: tuple index out of range

In [76]: 'I love {0}, i am {0}'.format(18)     # 占位符中的数字能够屡次调用一个位置参数
Out[76]: 'I love 18, i am 18'

可使用占位符加数字调用format的位置参数，而且能够屡次调用同一个位置参数

In [77]: 'I love {lang}, i am {age}'.format(lang='python', age=18)
Out[77]: 'I love python, i am 18'

In [78]: 'I love {lang}, i am {lang}'.format(lang='python', age=18)
Out[78]: 'I love python, i am python'

In [79]: 'I love {lang}, i am {lang}'.format(lang='python')
Out[79]: 'I love python, i am python'

In [81]: 'My name is {0}, i love {lang}, i am {age}'.format('xxj', lang='python', age=18)
Out[81]: 'My name is xxj, i love python, i am 18'

能够在占位符里加标识符，来使用关键字参数

能够同时支持位置参数和关键字参数

占位符和参数不匹配时，会抛出异常

In [82]: '{} {}'.format(18)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-82-21db8d47c754> in <module>()
----> 1 '{} {}'.format(18)

IndexError: tuple index out of range

In [83]: '{} {lang}'.format(18)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-83-afb26cfd80bf> in <module>()
----> 1 '{} {lang}'.format(18)

KeyError: 'lang'

In [84]: '{1} {2}'.format(0, 1, 2)
Out[84]: '1 2'

In [85]: '{1} {2}'.format("a", "b", "c")
Out[85]: 'b c'

{} 会按照顺序使用位置参数

{数字} 会把位置参数当成一个列表args, agrs[i]，当i不是args的索引的时候，抛出IndexError

{关键字｝ 会把关键字参数当成一个字典kwargs，使用kwargs[k]当k不是kwargs的key时，会抛出KeyError

在python2.6版本中，不能胜率大括号里的数字或者关键字

如何print 大括号呢？

In [90]: '{}'.format(18)
Out[90]: '18'

In [91]: '{{}}'.format(18)
Out[91]: '{}'

In [92]: '{{{}}}'.format(18)
Out[92]: '{18}'

In [93]: '{{{}}}'.format()
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-93-d7108b8b948d> in <module>()
----> 1 '{{{}}}'.format()

IndexError: tuple index out of range

2、bytes

一、bytes

bytes是python新引入的type

str是文本序列，bytes是字节序列

文本是有编码的（utf-8,gbk,GB18030等），字节没有编码这种说法

文本的编码是指，如何使用字节来表示字符

python3 str默认使用utf-8编码

str的全部操做除了encode，都有队友bytes的版本，可是传入的参数也必须是bytes

In [175]: b = b'i love python'

In [176]: b.find('o')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-176-03a41f7339b4> in <module>()
----> 1 b.find('o')

TypeError: a bytes-like object is required, not 'str'

In [177]: b.find(b'o')
Out[177]: 3

In [180]: s = '马哥教育'

In [181]: s.encode()
Out[181]: b'\xe9\xa9\xac\xe5\x93\xa5\xe6\x95\x99\xe8\x82\xb2'

In [182]: s.encode().find(b'\xac')  # bytes的操做是按字节来的
Out[182]: 2

In [184]: b
Out[184]: b'i love python'

In [185]: b.decode()         # bytes所特有的方法
Out[185]: 'i love python'

In [186]: b.hex()           # bytes所特有的方法
Out[186]: '69206c6f766520707974686f6e'

二、str转为bytes

In [145]: help(str.encode)

Help on method_descriptor:

encode(...)
    S.encode(encoding='utf-8', errors='strict') -> bytes
    
    Encode S using the codec registered for encoding. Default encoding
    is 'utf-8'. errors may be given to set a different error
    handling scheme. Default is 'strict' meaning that encoding errors raise
    a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
    'xmlcharrefreplace' as well as any other name registered with
    codecs.register_error that can handle UnicodeEncodeErrors.

    
In [103]: s.encode()    # 将str编码成bytes
Out[103]: b'\xe9\xa9\xac\xe5\x93\xa5\xe6\x95\x99\xe8\x82\xb2'  # 每3个16进制的数字表示一个中文

In [106]: type(s.encode)
Out[106]: builtin_function_or_method

In [107]: type(s.encode())
Out[107]: bytes


In [109]: '马'.encode()
Out[109]: b'\xe9\xa9\xac'

In [127]: bin(0xe9)     # 将16进制转化为二进制
Out[127]: '0b11101001'

In [128]: bin(0xa9)
Out[128]: '0b10101001'

In [129]: bin(0xac)
Out[129]: '0b10101100'    

11101001 10101001 10101100 代码马字


In [133]: s.encode()
Out[133]: b'\xe9\xa9\xac\xe5\x93\xa5\xe6\x95\x99\xe8\x82\xb2'

In [134]: s.encode(gbk)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-134-8fa822e76cc5> in <module>()
----> 1 s.encode(gbk)

NameError: name 'gbk' is not defined

In [135]: s.encode("gbk")                       # 使用不一样的编码，所获得的bytes不一样
Out[135]: b'\xc2\xed\xb8\xe7\xbd\xcc\xd3\xfd'

In [136]: s.encode("GBK")
Out[136]: b'\xc2\xed\xb8\xe7\xbd\xcc\xd3\xfd'

In [137]: s.encode("utf8")
Out[137]: b'\xe9\xa9\xac\xe5\x93\xa5\xe6\x95\x99\xe8\x82\xb2'

In [138]: s.encode("utf-8")
Out[138]: b'\xe9\xa9\xac\xe5\x93\xa5\xe6\x95\x99\xe8\x82\xb2'

三、bytes转为str

In [143]: help(bytes.decode)

Help on method_descriptor:

decode(self, /, encoding='utf-8', errors='strict')
    Decode the bytes using the codec registered for encoding.
    
    encoding
      The encoding with which to decode the bytes.
    errors
      The error handling scheme to use for the handling of decoding errors.
      The default is 'strict' meaning that decoding errors raise a
      UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
      as well as any other name registered with codecs.register_error that
      can handle UnicodeDecodeErrors.


In [139]: s.encode().decode()  # decode()解码
Out[139]: '马哥教育'

In [140]: s.encode().decode("gbk")  # 须要使用编码时所使用的编码方式才能正确解码
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-140-e4970109fa53> in <module>()
----> 1 s.encode().decode("gbk")

UnicodeDecodeError: 'gbk' codec can't decode byte 0xac in position 2: illegal multibyte sequence

3、bytearray

一、bytearray

bytearray是bytes的可变版本

str和bytes都是不可变的

In [197]: help(bytearray)

Help on class bytearray in module builtins:

class bytearray(object)
 |  bytearray(iterable_of_ints) -> bytearray
 |  bytearray(string, encoding[, errors]) -> bytearray
 |  bytearray(bytes_or_buffer) -> mutable copy of bytes_or_buffer
 |  bytearray(int) -> bytes array of size given by the parameter initialized with null bytes
 |  bytearray() -> empty bytes array
 |  
 |  Construct a mutable bytearray object from:
 |    - an iterable yielding integers in range(256) 
 |    - a text string encoded using the specified encoding
 |    - a bytes or a buffer object
 |    - any object implementing the buffer API.
 |    - an integer
 

In [206]: bytearray(10)
Out[206]: bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')

In [207]: bytearray(b"10")
Out[207]: bytearray(b'10')

In [208]: bytearray(b"abc")
Out[208]: bytearray(b'abc')

In [209]: bytearray("abc")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-209-355ddbfdfb18> in <module>()
----> 1 bytearray("abc")

TypeError: string argument without an encoding

In [210]: bytearray("abc", encoding)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-210-6baeeb8ceb00> in <module>()
----> 1 bytearray("abc", encoding)

NameError: name 'encoding' is not defined

In [211]: bytearray("abc", "utf")
Out[211]: bytearray(b'abc')

In [212]: bytearray("abc", "utf8")
Out[212]: bytearray(b'abc')

In [213]: bytearray("abc", "utf-8")
Out[213]: bytearray(b'abc')

In [215]: bytearray([1, 2])
Out[215]: bytearray(b'\x01\x02')


In [226]: b = bytearray(12)   # bytearry 是可变的

In [227]: b
Out[227]: bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')

In [228]: b[3]
Out[228]: 0

In [229]: b[3]= 5

In [230]: b
Out[230]: bytearray(b'\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x00')

bytearray相对于bytes来讲，多了insert、append、extend、pop、remove、clear reverse方法，而且能够索引操做

bytearray的insert、append、remove、count的参数必须是int，由于bytearray操做的字节，但python中没有byte这种类型，能够用int来表示byte。int必须在0-256这个范围内