Tesseract-Indic-OCR: What next?

Sunday, April 10, 2011

What next?

Step one was to get the maatraa clipping code into Tesseract, which has happened. We still have the following issues to resolve before we can have excellent recognition rates:

We need to split the following glyphs into separate consonant and vowel signs.

1) Consonant + descending vowel sign

Example: