Tuesday, June 15, 2010

Paper paper everywhere

I have an itch. I need to get hundreds of pieces of paper to my bookkeeper who happens to be on the other side of the country. And as usual, when poking an itch things happen in interesting ways, opening up possibilities. So start with Python, PyQt and the KDE python bindings. The plan was to write some kind of bulk scanning application, with the ability to annotate notes to the scans, bundle them up in a pdf, and send them off by email. Some issues that I ran across. The python-imaging-scan module is quite flakey. Repeated scans would lead to crashes. I encapsulated it within a subprocess, which seems to prevent bringing the whole application down, and somehow allows it to do it's thing each time reasonably reliably. Still has issues however. The python-keyring-kde works nicely. Very simple to use, and it works. I use it to keep the password needed to login to send emails. The python-imaging is very nice. I clip the images to remove whitespace around it, and it is very easy. To create a QImage is trivial using the QtImage module. The reportlab pdf creation module is very nice. Once you get your mind around the structure, which is much easier than the documentation seems to portray, it is very easy to create a simple pdf. The code is at code.google.com. There are doubtless many bugs, and I have only done testing with the scanner that I have.
I now have a few hundred pdfs containing all kinds of interesting information. They are mostly structured documents, ie. invoices. Now to figure out some way of parsing and categorizing the data so that I can use it in some way.

Comments:
"Parsing" as in using character recognition and extracting structure from a document? Quite difficult. Have spend some time with professional software, and it still was a mess. Time consuming learning process (sostware and user) and much variation in documents. No room for error either in invoices.

Some digital standards are being promoted and hope that that happens soon.
 
Indeed. I've tried such things also.

The only way I see it working is by writing a series of scripts/regex' that extract specific data. I have maybe 8-10 vendors each with a specific layout.
 
KDE SC would really need somekind simple scanning application. The skanlite is not easy enough for basic users. It should be something like Gnomes "Simple scan" http://bobthegnome.blogspot.com/2010/02/simple-scan-090.html

It almost should be so that the basic scanning function would be in Okular. That you could just scan the paper, add annotations and then save as new PDF and then email it.
 
This comment has been removed by the author.
 
Helpful information looks like a very well oiled application.

Scanshell Store
 
Post a Comment

Subscribe to Post Comments [Atom]





<< Home

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]