Scan Man

"You're thirty five years old, Mister Vale. Why are you such a derelict? Such a piece of human junk? The answer's simple. You're a scanner, but you don't realise it. That has been the source of all your agony. But I will show you now that it can be a source of great power."
- Lines from the film Scanners, directed by David Cronenberg

At two'o'clock in the morning on Thursday, I had a small brainwave - one of those sideways but in-retrospect-obvious ideas had by computer programmers, philosophy students, and habitual problem solvers. And I've been all three.

If my OCR software has trouble recoginising superscript numbers, use the training function to teach it what they look like. But where can I find a document to train it with? A page containing a hundred or so examples of superscript numerals embedded in normal text? Simple. I make one in a wordprocessor. Print it out at different common sizes and fonts, then scan it at various slightly skewed orientations.

So that's what I spent the early hours of the morning and the late hours of the afternoon doing. In between, sleeping, signing on, and completing the sleep.

I want to give it some more training before re-OCRing the articles. But before that, there's some short stories (and the odd novel) that I wrote fifteen years ago. There's also a dozen books (mostly on languages or linguistics) that I've had in photocopy form for about the same amount of time.

Digitise them, and finally get to throw away the mountains of paper. What once took up tens of boxes, reduced to a few CDs.

Oh, and just in case you think I'm not a serious, hardcore scanning and oCR freak, mother and I are splitting the cost of a brand new fancy scanner designed for books.

The idea is to make digital copies of our somewhat unwieldy library of textbooks, Open University courses and trashy novels for our own use, and ditch the originals, possibly making back some of the cost by flogging them on ebay.
Two hours later, and 26 of 28 short stories scanned. Only one was so abysmally awful I decided to just bin it, and one was printed on thermal paper (remember thermal printers?), which has degraded too much to OCR.
I haven't been looking at the newspapers lately. I hear fragments. 80,000 deaths so far from the Indian earthquake, and maybe another 40,000 to come. The public have charity fatigue and the governments don't feel pressured to donate. There's a new wonderdrug to treat breast cancer, the Yorkshire Ripper is back in fashion, and The Tory leadership election is a choice between Davis (Hague clone) and Cameron (Blair clone).