python处理识别图片验证码

时间 2019-12-01

原文原文链接

安装图片图像处理标准库PIL

32位windows系统下载连接:https://pypi.python.org/pypi/Pillow/2.1.0#id2
64位windows系统下载连接:https://pypi.python.org/pypi/Pillow/2.1.0#downloads

图片处理示例:

1 from PIL import Image
2 from pytesser import *
3 image = Image.open('7039.jpg')
4 print image_file_to_string('7039.jpg')
5 print image_to_string(image)

备注：若是出现报错ImportError: The _imaging C module is not installed，可能出现的缘由下载错了版本，更改下安装64位的版本python

图片识别

pytesser是谷歌OCR开源项目的一个模块，在python中导入这个模块便可将图片中的文字转换成文本，可是在pytesser模块中调用了tesseract，因此须要先安装tesseractwindows

tesseract下载路径:https://bitbucket.org/3togo/python-tesseract/downloads/,选择合适的版本进行下载安装ide

pytesser安装
- 下载路径:http://code.google.com/p/pytesser/ ,下载下来的模块包并非传统的安装包,因此须要进行一系列的设置
- 解压文件夹,新建一个空的__init.py__文件
- 下载Tesseract OCR engine：http://code.google.com/p/tesseract-ocr/,解压后,将文件中的tessdata文件夹,复制至ptesser中进行替换原文件
- 复制pytesser至python安装目录的Libsite-packages,而且添加至环境变量中(若是以为这一系列操做复杂能够直接将源码放到代码路径)

图片识别源码google

1 from PIL import Image
2 from pytesser import *
3 image = Image.open('7039.jpg')
4 print image_file_to_string('7039.jpg')
5 print image_to_string(image)

文件示例 7039.jpgspa

可能遇到的问题及解决方案:code

raise IOError("cannot identify image file"),将脚本中的import Image改成from PIL import Image
将pytesser.py中的import Image改成from PIL import Image