Friday, September 28, 2007


Some decent and lasting rainfall today. It hasn't rained significantly since June. Today is cool, 10C and very wet.

Still plugging away on handling scans. Every roadblock that I hit seems to have been solved by someone. Interestingly, it seems the real challenge is defining the problem space.

Where am I? I'm about to test orsu threshold code. I'm thinking that all I really need to do is basic cleanup, get the image segments defined, pass them to ocr, check if results are reliable, and if not do more extensive (and time consuming) cleanup. It seems the only areas where there are real issues are text with shaded background, which when scanned ends up a mess of blots, and places where text size changes on the same horizontal line. Other than that, the ocr seems reliable. So why waste time preprocessing other areas?

Sunday, September 16, 2007

Tipping Point

Some things of note: SANE, the scanner backend system, is going to do HAL notifications in the next version. So if the scanner is plugged in, possibly if buttons are pushed (not sure) HAL will broadcast through dbus etc so that apps can respond appropriately. Neat. Too late for 4.0, but SOLID would appreciate a patch no doubt.

ASUS is going to release their small and cheap notebook, the Eee PC. Here is a hands on review. This really interests me. I use a Palm right now, and would like a small portable notebook. Inexpensive helps. It seems the really small light notebooks are also the most expensive, the cheap ones don't have any realistic battery life. The Eee weighs less than 2 lbs, costs less than $500, battery life 3-4 hours. Quite resource limited, however. They will be available at the end of the month, possibly. Be interesting to see how they work out.

I don't think I've written so much code in the last couple of months ever, and thrown most of it away. Sometimes the most difficult thing to do with a project is defining the problem. Once that is done, the solutions are evident. I'm working on cleaning up scanned documents, and coming up with blocks to feed through the ocr engines. There are utilities available, such as unpaper, but they are focussed on a different problem set. Playing with graphic data is endlessly fascinating, with the various edge detection algorithms, etc. Great fun.

Speaking of edges, are we on the edge of a realization that the structures that worked so well for development and distribution of server applications fall flat when applied to desktop systems? This LWN article describes the unintended consequences when the various players in enterprise distributions attempt to satisfy the need for stability. The mainstream kernel has drivers for new hardware, but the enterprise distros don't distribute newer kernels, which encourage hardware manufacturers to develop binary modules for the said enterprise distro. Boom. All the work in encouraging free and open development goes to naught when hardware people can't get their drivers out except through binary modules.

What unintended consequences could arise from how things are structured in the desktop realm? Aaron was wondering about what it all means and where it's going. I have an idea. What thought process has led to a situation where a person wanting to develop a broadly capable app, using web, document creation, IPC, and desktop integration would have to learn at least 3 completely different API's? That on top of the multiple API' of supporting libraries. This is the situation in the most commonly distributed desktop available. Frankly, I'm not interested, freedom or not. KDE has the possibility of being the integrator of all the different api's that developers would need. One line calls to get a necessary service. This is neat and a differentiates KDE from anything else out there.

Essentially the primary linux desktop right now is equal to windows 3.0 with a web browser. If anyone wants to know why the marketshare sucks, or Dell is having trouble integrating, look no further. Unintended consequences of decisions made for good reasons.

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]