获取Tesseract源码的方式有不少。能够直接从repo获取,也能够下载压缩包。不过编译的时候每每也会出现各类奇怪的问题。这里介绍如何简单的配置和编译源码。html
参考原文:How to Build Tesseract OCR Library on Windowsexpress
安装过程当中勾选Tesseract development files:ui
在安装目录中找到vs2008到工程目录:
google
找到全部编译相关的库:spa
打开Visual Studio 2008(没有的能够去官网下载express版本),导入工程编译。最后生成DEBUG和RELEASE两个版本的DLL:libtesseract302d.dll ,libtesseract302.dllcode
在README中注意这段话:orm
Dependencies and Licenses ========================= Leptonica is required. (www.leptonica.com). Tesseract no longer compiles without Leptonica. Libtiff is no longer required as a direct dependency.
Tesseract依赖Leptonica库,因此再看下Leptonica是怎么编译的。
xml
Leptonica是C语言编写的一个图像处理库,支持JPEG, PNG, TIFF,GIF。htm
把三个包解压,并按照下面的结构组建编译环境:
BuildFolder\ include\ leptonica-1.68\ lib\
BuildFolder\leptonica-1.68 contents:
config\ Not used for Windows builds prog\ Regression tests, examples, utilities src\ Source files for liblept vs2008\ Visual Studio 2008 specific files DLL Debug\ liblept DLL Debug build output DLL Release\ liblept DLL Release build output LIB Debug\ liblept LIB Debug build output LIB Release\ liblept LIB Release build output prog_projects\ Projects for prog programs ioformats_reg\ Sample project for prog\ioformats_reg.exe DLL Debug\ DLL Debug build output for sample project DLL Release\ DLL Release build output for sample project LIB Debug\ LIB Debug build output for sample project LIB Release\ LIB Release build output for sample project ioformats_reg.vcproj The ioformats_reg project file leptonica.sln The Leptonica solution file leptonica.vcproj The Leptonica project file
打开Visual Studio 2008,导入工程编译。最后生成DEBUG和RELEASE两个版本的DLL:liblept168d.dll,liblept168.dll