I had tried some time last year to push my matra clipping code to Tesseract-OCR upstream, but Ray Smith the lead developer of the project asked about the accuracy of the code and I never got around to calculating it. Well actually I still havent calculated it, but I did something new.
Check the set of pictures I uploaded at
. The first picture is the normal picture to be OCRed. The second picture is the clipped+thresholded image. The third image is the difference of the clipped+thresholded and thresholded images.
Here is the Python code that creates a new image out of two input images:
#!/usr/local/bin/python
import ImageChops, Image
th=Image.open("benth.tif")
clip=Image.open("bentest.tif")
new=ImageChops.difference(th,clip)
new=ImageChops.invert(new)
new.save("diff.tif","TIFF")
I will now show this to Ray Smith. Lets see if he likes it.
Quite informative post. Keep posting. Thanks for sharing.
ReplyDeleteRegards,
clipping path