1.分割字符串python
re.split()正则表达式
>>> line = 'asdf fjdk; afed, fjek,asdf, foo' >>> import re >>> re.split(r'[;,\s]\s*', line) ['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo'] 若是使用()捕捉,则匹配项也包含在最终结果中 >>> re.split(r'(;|,|\s)\s*', line) ['asdf', ' ', 'fjdk', ';', 'afed', ',', 'fjek', ',', 'asdf', ',', 'foo'] 若是你不想让分割字符出如今结果中,但仍须要使用()来分割,能够使用非捕捉组,如(?:.....) >>> re.split(r'(?:,|;|\s)\s*', line) ['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']
2在开头或结尾匹配字符串shell
最简单的方法是使用str.startswith(),str.endswith() >>> filename = 'spam.txt' >>> filename.endswith('.txt') True >>> filename.startswith('file:') False 也能够向startswith或endswith()提供多个参数,但必须是tuple >>> choices = ['http:', 'ftp:'] >>> url.startswith(choices) --------------------------------------------------------------------------- Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: startswith first arg must be str, unicode, or tuple, not list --------------------------------------------------------------------------- >>> choices = ('http:', 'ftp:') >>> url.startswith(choices) True 用正则表达式也能够实现,不过对于简单匹配,有点小题大作 >>> import re >>> url = 'http://www.python.org' >>> re.match('http:|https:|ftp:', url) <_sre.SRE_Match object at 0x101253098>
3用shell通配符匹配字符串ubuntu
>>> from fnmatch import fnmatch >>> fnmatch('foo.txt', '*.txt') True >>> fnmatch('Dat45.csv', 'Dat[0-9]*') True 对于不一样的平台,可能会有大小写不敏感问题, >>> fnmatch('foo.txt', '*.TXT')#ubuntu14.10 False #这个时候能够使用fnmatchcase,是大小写敏感的 >>> from fnmatch import fnmatchcase >>> fnmatchcase('foo.txt', '*.TXT') False
4匹配和搜索文本缓存
>>> text = 'yeah, but no, but yeah, but no, but yeah' >>> text.find('no') 10 >>> datepat = re.compile(r'\d+/\d+/\d+') >>> text1 = '11/27/2012' >>> datepat.match(text1) <_sre.SRE_Match object at 0x7f0bda8cc4a8> re.match()从字符串的开头匹配,若是想找到全部的匹配项,用findall() >>> text = 'Today is 11/27/2012. PyCon starts 3/13/2013.' >>> datepat.findall(text) ['11/27/2012', '3/13/2013'] findall()返回的是一个list,若是你向要返回一个可迭代对象,能够用finditer() >>> datepat = re.compile(r'(\d+)/(\d+)/(\d+)') >>> for m in datepat.finditer(text): ... print m.groups() ... ('11', '27', '2012') ('3', '13', '2013')
若是你须要常用某个正则表达式,最好先用compiler编译,虽然模块功能会缓存最近编译的表达式,因此不会有很大的
性能改善,可是使用本身编译的正则表达式节省了额外的查找和处理时间性能
5查找替换字符串url
简单的替换能够用replace,并未改变原来的文本 >>> text = 'yeah, but no, but yeah, but no, but yeah' >>> text.replace('yeah', 'yep') 'yep, but no, but yep, but no, but yep' >>> text 'yeah, but no, but yeah, but no, but yeah' 也能够用re.sub() >>> text = 'Today is 11/27/2012. PyCon starts 3/13/2013.' >>> re.sub(r'(\d+)/(\d+)/(\d+)', r'\3-\1-\2', text) 'Today is 2012-11-27. PyCon starts 2013-3-13.' >>> text 'Today is 11/27/2012. PyCon starts 3/13/2013.' 若是须要重复使用该正则表达式,最好先编译 >>> datepat = re.compile(r'(\d+)/(\d+)/(\d+)') >>> datepat.sub(r'\3-\1-\2', text) 'Today is 2012-11-27. PyCon starts 2013-3-13.' 若是你想知道有多少个匹配项能够用subn >>> newtext, n = datepat.subn(r'\3-\1-\2', text) >>> newtext 'Today is 2012-11-27. PyCon starts 2013-3-13.' >>> n 2