Tesseract-Indic-OCR: Clipping accuracy

I had tried some time last year to push my matra clipping code to Tesseract-OCR upstream, but Ray Smith the lead developer of the project asked about the accuracy of the code and I never got around to calculating it. Well actually I still havent calculated it, but I did something new.
Check the set of pictures I uploaded at . The first picture is the normal picture to be OCRed. The second picture is the clipped+thresholded image. The third image is the difference of the clipped+thresholded and thresholded images.

Here is the Python code that creates a new image out of two input images:

#!/usr/local/bin/python

import ImageChops, Image

th=Image.open("benth.tif")
clip=Image.open("bentest.tif")

new=ImageChops.difference(th,clip)
new=ImageChops.invert(new)

new.save("diff.tif","TIFF")

I will now show this to Ray Smith. Lets see if he likes it.

Tesseract-Indic-OCR

Friday, April 17, 2009

Clipping accuracy

1 comment:

Blog Archive

About Me