OpenCV3的kNN算法进行OCR识别－使用Python

时间 2019-11-13

标签 opencv3 opencv knn 算法进行 ocr 识别使用 python 栏目 Python 繁體版

原文原文链接

OpenCV3的kNN算法进行OCR识别－使用Python

http://docs.opencv.org/master/d8/d4b/tutorial_py_knn_opencv.htmlhtml

Goal

In this chapterpython

We will use our knowledge on kNN to build a basic OCR application.git
We will try with Digits and Alphabets data available that comes with OpenCV.算法

目标app

• 要根据咱们掌握的 kNN 知识建立一个基本的 OCR 程序
• 使用 OpenCV 自带的手写数字和字母数据测试咱们的程序函数

OCR of Hand-written Digits

Our goal is to build an application which can read the handwritten digits. For this we need some train_data and test_data. OpenCV comes with an image digits.png (in the folder opencv/samples/data/) which has 5000 handwritten digits (500 for each digit). Each digit is a 20x20 image. So our first step is to split this image into 5000 different digits. For each digit, we flatten it into a single row with 400 pixels. That is our feature set, ie intensity values of all pixels. It is the simplest feature set we can create. We use first 250 samples of each digit as train_data, and next 250 samples as test_data. So let's prepare them first.
测试

1 手写数字的 OCRui

咱们的目的是建立一个能够对手写数字进行识别的程序。为了达到这个目的咱们须要训练数据和测试数据。OpenCV 安装包中有一副图片(/samples/ python2/data/digits.png), 其中有 5000 个手写数字(每一个数字重复 500遍)。每一个数字是一个 20x20 的小图。因此第一步就是将这个图像分割成 5000个不一样的数字。咱们在将拆分后的每个数字的图像重排成一行含有 400 个像素点的新图像。这个就是咱们的特征集,全部像素的灰度值。这是咱们能建立的最简单的特征集。咱们使用每一个数字的前 250 个样本作训练数据,剩余的250 个作测试数据。先准备一下:this

import numpy as np
import cv2
from matplotlib import pyplot as plt

img = cv2.imread('digits.png')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

# Now we split the image to 5000 cells, each 20x20 size
cells = [np.hsplit(row,100) for row in np.vsplit(gray,50)]

# Make it into a Numpy array. It size will be (50,100,20,20)
x = np.array(cells)

# Now we prepare train_data and test_data.
train = x[:,:50].reshape(-1,400).astype(np.float32) # Size = (2500,400)
test = x[:,50:100].reshape(-1,400).astype(np.float32) # Size = (2500,400)

# Create labels for train and test data
k = np.arange(10)
train_labels = np.repeat(k,250)[:,np.newaxis]
test_labels = train_labels.copy()

# Initiate kNN, train the data, then test it with test data for k=1
knn = cv2.KNearest()
knn.train(train,train_labels)
ret,result,neighbours,dist = knn.find_nearest(test,k=5)

# Now we check the accuracy of classification
# For that, compare the result with test_labels and check which are wrong
matches = result==test_labels
correct = np.count_nonzero(matches)
accuracy = correct*100.0/result.size
print accuracy

So our basic OCR app is ready. This particular example gave me an accuracy of 91%. One option improve accuracy is to add more data for training, especially the wrong ones. So instead of finding this training data everytime I start application, I better save it, so that next time, I directly read this data from a file and start classification. You can do it with the help of some Numpy functions like np.savetxt, np.savez, np.load etc. Please check their docs for more details.spa

如今最基本的 OCR 程序已经准备好了,这个示例中咱们获得的准确率为91%。改善准确度的一个办法是提供更多的训练数据,尤为是判断错误的那些数字。为了不每次运行程序都要准备和训练分类器,咱们最好把它保留, 这样在下次运行是时,只须要从文件中读取这些数据开始进行分类就能够了。Numpy 函数 np.savetxt,np.load 等能够帮助咱们，具体的查看相应的文档。

   1 # save the data
    2 np.savez('knn_data.npz',train=train, train_labels=train_labels)
    3 
    4 # Now load the data
    5 with np.load('knn_data.npz') as data:
    6     print data.files
    7     train = data['train']
    8     train_labels = data['train_labels']

In my system, it takes around 4.4 MB of memory. Since we are using intensity values (uint8 data) as features, it would be better to convert the data to np.uint8 first and then save it. It takes only 1.1 MB in this case. Then while loading, you can convert back into float32.

在个人系统中,占用的空间大概为 4.4M。因为咱们如今使用灰度值 (unint8)做为特征,在保存以前最好先把这些数据装换成 np.uint8 格式,这样就只须要占用 1.1M 的空间。在加载数据时再转会到 float32。

OCR of English Alphabets

Next we will do the same for English alphabets, but there is a slight change in data and feature set. Here, instead of images, OpenCV comes with a data file, letter-recognition.data in opencv/samples/cpp/ folder. If you open it, you will see 20000 lines which may, on first sight, look like garbage. Actually, in each row, first column is an alphabet which is our label. Next 16 numbers following it are its different features. These features are obtained from UCI Machine Learning Repository. You can find the details of these features in this page.

There are 20000 samples available, so we take first 10000 data as training samples and remaining 10000 as test samples. We should change the alphabets to ascii characters because we can't work with alphabets directly.

英文字母的 OCR

接下来咱们来作英文字母的 OCR。和上面作法同样,可是数据和特征集有一些不一样。如今 OpenCV 给出的不是图片了,而是一个数据文件(/samples/ cpp/letter-recognition.data)。若是打开它的话,你会发现它有 20000 行, 第同样看上去就像是垃圾。实际上每一行的第一列是咱们的一个字母标记。接下来的 16 个数字是它的不一样特征。这些特征来源于UCI Machine Learning Repository。你能够在此页找到更多相关信息。

有 20000 个样本能够使用,咱们取前 10000 个做为训练样本，剩下的10000 个做为测试样本。咱们应在先把字母表转换成 asc 码,由于咱们不能直接处理字母。

import cv2
import numpy as np
    3 import matplotlib.pyplot as plt
    4 
    5 # Load the data, converters convert the letter to a number
    6 data= np.loadtxt('letter-recognition.data', dtype= 'float32', delimiter = ',',
    7                     converters= {0: lambda ch: ord(ch)-ord('A')})
    8 
    9 # split the data to two, 10000 each for train and test
   10 train, test = np.vsplit(data,2)
   11 
   12 # split trainData and testData to features and responses
   13 responses, trainData = np.hsplit(train,[1])
   14 labels, testData = np.hsplit(test,[1])
   15 
   16 # Initiate the kNN, classify, measure accuracy.
   17 knn = cv2.KNearest()
   18 knn.train(trainData, responses)
   19 ret, result, neighbours, dist = knn.find_nearest(testData, k=5)
   20 
   21 correct = np.count_nonzero(result == labels)
   22 accuracy = correct*100.0/10000
   23 print accuracy

It gives me an accuracy of 93.22%. Again, if you want to increase accuracy, you can iteratively add error data in each level.

准确率达到了 93.22%。一样你能够经过增长训练样本的数量来提升准确率。