Sunday, April 10, 2011

What next?

Step one was to get the maatraa clipping code into Tesseract, which has happened. We still have the following issues to resolve before we can have excellent recognition rates:

We need to split the following glyphs into separate consonant and vowel signs.

1) Consonant + descending vowel sign


2) Consonant + ascending vowel sign


In summary we need to be able to do the following transformation before sending the image to Tesseract:


