OCR6：Custom Traineddata

时间 2019-11-06

标签 ocr6 ocr custom traineddata 繁體版

原文原文链接

参考：https://groups.google.com/forum/#!msg/tesseract-ocr/MSYezIbckvs/kO1VoNKMDMQJ 字体

V4版本代码示例：google

import pytesseract
from PIL import Image as img

text = pytesseract.image_to_string(img.open('src2\B1.jpg'), lang='teld+chi_sim', config='--psm 3 --oem 1')
print(text.replace('”', ''))

合并识别结果spa

在实际使用 tesseract-orc 识别库的时候，初次制做的识别库颇有可能识别率不太理想，须要后期慢慢补充。将多个修正过的box文件合并成一个识别库。

首先，须要图片样本.tif文件，位置文件.box ,只要有这两个文件在，就能够合并字典

假设已存在以下样品图片和修正过的box文件：code

image.font.1.tif image.font.1.box
image.font.2.tif image.font.2.box
image.font.3.fit image.font.3.box

一、先生成相对应的 .tr 文件orm

tesseract image.font.1.tif image.font.1 nobatch box.train
tesseract image.font.2.tif image.font.2 nobatch box.train
tesseract image.font.3.tif image.font.3 nobatch box.train

二、提取字符blog

unicharset_extractor image.font.1.box image.font.2.box image.font.3.box

三、生成字体特征文件图片

echo image 0 0 0 0 0 >font_propertiesfont

四、执行以下命令get

mftraining -F font -U unicharset image.font.1.tr image.font.2.tr image.font.3.tr

五、汇集全部.tr 文件string

cntraining image.font.1.tr image.font.2.tr image.font.3.tr

六、重命名文件it

unicharset
inttemp
normproto
pfftable
shapetable

七、合并全部文件生成一个大的字库文件

combine_tessdata image.

示例代码：

/*生成box文件*/
/*tesseract teld.shz.exp0.tif teld.shz.exp0 -l chi_sim --psm 3 --oem 1 batch.nochop makebox*/ tesseract teld.shz.exp0.tif teld.shz.exp0 -l chi_sim batch.nochop makebox /*生成font_properties文件*/ echo shz 0 0 0 0 0 >font_properties /*生成.tr训练文件*/ tesseract teld.shz.exp0.tif teld.shz.exp0 nobatch box.train /*生成字符集文件*/ unicharset_extractor teld.shz.exp0.box /*生成shape文件*/ shapeclustering -F font_properties -U unicharset teld.shz.exp0.tr /*生成聚字符特征文件*/ mftraining -F font_properties -U unicharset teld.shz.exp0.tr /*生成字符正常化特征文件*/ cntraining teld.shz.exp0.tr /*文件重命名*/ rename normproto teld.normproto rename inttemp teld.inttemp rename pffmtable teld.pffmtable rename shapetable teld.shapetable rename unicharset teld.unicharset /*合并训练文件*/ combine_tessdata teld.

参考资料

https://yq.aliyun.com/articles/297912

1. Custom ViewGroups
2. Custom Control
3. Custom Diagrams
4. custom drawer
5. Apply custom metadata
6. Custom Date tag
7. Custom WAR Packager
8. DongGuan Custom Manufacturing
9. SharePoint2010 Custom Service
10. custom list view
更多相关文章...
• PHP restore_error_handler() 函数 - PHP参考手册
• ASP GetLastError() 方法 (ASP 3.0) - ASP 教程