目录python
这个谷歌的识别项目早就据说了,使用以后发现,真的很厉害。写下初次简单使用的过程吧。git
谷歌的开源识别项目
我下了这两个,chi是扩展的识别中文须要,只安装.exe便可,而后配置环境变量github
C:\Users\27569>tesseract Usage: tesseract --help | --help-extra | --version tesseract --list-langs tesseract imagename outputbase [options...] [configfile...] OCR options: -l LANG[+LANG] Specify language(s) used for OCR. NOTE: These options must occur before any configfile. Single options: --help Show this help message. --help-extra Show extra help for advanced users. --version Show version information. --list-langs List available languages for tesseract engine.
使用python调用测试,windows下,我记得我程序第一次是不通的,后来改了tesseract文件的源码的某个路径才成功运行的windows
requirment.txt
pillow pytesseract
run.py
import io import re import pytesseract from PIL import Image class Ocr: def __init__(self): self.day_re = re.compile('(\d{4}-\d{2}-\d{2})') self.daytime_re1 = re.compile('(\d{2}:\d{2})') self.daytime_re2 = re.compile('(\d{2}:\d{2}-\d{2}:\d{2})') def prepare_img(self, img): """图片预处理,提升识别率""" img = img.convert('L') threshold = 200 # 根据状况来定,127 table = [] for i in range(256): if i < threshold: table.append(0) else: table.append(1) return img.point(table, '1') def ocr(self, img): """识别""" img = self.prepare_img(img) return pytesseract.image_to_string(img, lang='eng', config='psm 7') # lang: eng 英文, chi_sim 中文(须要训练库) if __name__ == '__main__': c = Ocr() with open('0.jpg', 'rb') as f: image_binary = f.read() byte_arr = io.BytesIO(image_binary) # Image.open() 打开图片的第一种方式 img = Image.open(byte_arr) print(c.ocr(img)) # Image.open() 打开图片的第二种方式 img = Image.open('0.jpg') print(c.ocr(img))