Saturday, May 26, 2007
Delving into libkscan
Well, I've got the caller id module almost done. I haven't written the plugin and dbus stuff yet though. My mind keeps straying to scanning and ocr, so I better follow.
Soon after posting the last entry, I was contacted by a fellow who is working on KTiny, a kde frontend to TinyERP. It seems that some means of scanning and ocr'ing invoices has some interest. Gamera is also working on this.
Since libkscan already is written, I figured I should use it. I am now building a kde4 setup so I can link to the libraries. For my purposes I don't need a complicated scanning application, just something that scans at predetermined settings, saving the image. Preferably it would just be a matter of loading the scanner and pressing a button. Tesseract will do the ocr reliably, but not yet return the coordinates of the text. Someone has done a dll for windows, but not released the source. I'm hoping that tesseract will be fixed by the time I need to start experimenting with it.
My specific needs of document recognition are reasonably well defined. Invoices have things in common; a date, a number of some kind, a vendor identification, terms and shipping stuff, then a list of items showing quantity, description, shipped or back ordered, price, discount, total. Or simply a description and total. My user audience would typically use 5 or 6 major vendors, and maybe twice that in minor vendors. In other words, most of the paper going through would be very similar. I'm not sure if this would make sense, but if you had the text, the coordinates of the text, and a couple of examples from a vendor to get an idea of what changes and what stays the same across invoices, some logic could probably extract the desired information. We shall see.
Take a look.
Subscribe to Post Comments [Atom]
Links to this post:
Subscribe to Posts [Atom]