PyTips 0x07 - Python 字符串

时间 2019-12-07

原文原文链接

全部用过 Python (2&3)的人应该都看过下面两行错误信息：git

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)github

UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byteapi

这就是 Python 界的"锟斤拷"！工具

今天和接下来几期的内容将主要关注 Python 中的字符串（str）、字节（bytes）及二者之间的相互转换（encode/decode）。也许不能让你忽然间解决全部乱码问题，但但愿能够帮助你迅速找到问题所在。spa

定义

Python 中对字符串的定义以下：code

Textual data in Python is handled with str objects, or strings. Strings are immutable sequences of Unicode code points.orm

Python 3.5 中字符串是由一系列 Unicode 码位（code point）所组成的不可变序列：继承

('S' 'T' 'R' 'I' 'N' 'G')

'STRING'

不可变是指没法对字符串自己进行更改操做：ip

s = 'Hello'
print(s[3])
s[3] = 'o'

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-2-ce8cf24852f9> in <module>()
      1 s = 'Hello'
      2 print(s[3])
----> 3 s[3] = 'o'


TypeError: 'str' object does not support item assignment

而序列（sequence）则是指字符串继承序列类型（list/tuple/range）的通用操做：

[i.upper() for i in "hello"]

['H', 'E', 'L', 'L', 'O']

至于 Unicode 暂时能够看做一张很是大的地图，这张地图里面记录了世界上全部的符号，而码位则是每一个符号所对应的坐标（具体内容将在后面的几期介绍）。

s = '雨'
print(s)
print(len(s))
print(s.encode())

雨
1
b'\xe9\x9b\xa8'

经常使用操做

len：字符串长度；
split & join
find & index
strip
upper & lower & swapcase & title & capitalize
endswith & startswith & is*
zfill

# split & join
s = "Hello world!"
print(",".join(s.split())) # 经常使用的切分 & 重组操做

"https://github.com/rainyear/pytips".split("/", 2) # 限定切分次数

Hello,world!

['https:', '', 'github.com/rainyear/pytips']

s = "coffee"
print(s.find('f'))    # 从左至右搜索，返回第一个下标
print(s.rfind('f'))   # 从右至左搜索，返回第一个下表

print(s.find('a'))    # 若不存在则返回 -1
print(s.index('a'))   # 若不存在则抛出 ValueError，其他与 find 相同

2
3
-1

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-6-59556fd9319f> in <module>()
      4 
      5 print(s.find('a'))    # 若不存在则返回 -1
----> 6 print(s.index('a'))   # 若不存在则抛出 ValueError，其他与 find 相同


ValueError: substring not found

print(" hello world    ".strip())
print("helloworld".strip("heo"))
print("["+"          i         ".lstrip() +"]")
print("["+"          i         ".rstrip() +"]")

hello world
lloworld
[i         ]
[          i]

print("{}\n{}\n{}\n{}\n{}".format(
    "hello, WORLD".upper(),
    "hello, WORLD".lower(),
    "hello, WORLD".swapcase(),
    "hello, WORLD".capitalize(),
    "hello, WORLD".title()))

HELLO, WORLD
hello, world
HELLO, world
Hello, world
Hello, World

print("""
{}|{}
{}|{}
{}|{}
{}|{}
{}|{}
{}|{}
""".format(
    "Python".startswith("P"),"Python".startswith("y"),
    "Python".endswith("n"),"Python".endswith("o"),
    "i23o6".isalnum(),"1 2 3 0 6".isalnum(),
    "isalpha".isalpha(),"isa1pha".isalpha(),
    "python".islower(),"Python".islower(),
    "PYTHON".isupper(),"Python".isupper(),
))

True|False
True|False
True|False
True|False
True|False
True|False

"101".zfill(8)

'00000101'

format / encode

格式化输出 format 是很是有用的工具，将会单独进行介绍；encode 会在 bytes-decode-Unicode-encode-bytes 中详细介绍。