[Linux-aus] Another view of scanning Re: [BUGA] Quality of commercial hardware
dan at shearer.org
Fri Jun 25 13:22:09 UTC 2004
On Fri, Jun 25, 2004 at 11:43:51AM +0930, Greg 'groggy' Lehey wrote:
> I'd be interested in any kind of discussion, including ideas on what
> to replace the scanner with.
You asked for it. I will open by saying that I have just completed a
scanning project, sokme of the results of which will be on the web with
a day or two. And it was completely painless and with low investment.
More of that later in this message -- I'll happily give you my secret
Now, as to bits of hardware. In scanners almost nobody is open by
design or default.
But I suspect one of the mistakes was to choose Canon. They have a
well-earned reputation for being the most closed of the closed.
This might be a starting point for a discussion of where in a notional
openness spectrum other manufacturers might fit. Others may have some
One angle to think about might be trying to extrapolate from printers to
scanners: many companies make both and some of those companies are quite
open with their printer specs and/or drivers.
Another angle might be specialist companies. Scanners are one area where
some niche suppliers continue to make a living because they are good at
what they do. These companies of course are getting thinned out by the
increased capabilties of the global companies' product ranges.
This should be sufficient for you to do the odd quick scan as part of
everyday needs, photocopy/fax replacement etc.
The Other Way of Scanning
May I suggest a Gordian Knot solution: if you have a specific bulk or
tricky job, get a specialist to do your scanning for you. I think it
makes sense from dollars and time-consumed points of view. All you
really need is some raw ouput and then you can do whatever fancy
post-processing things you want. Years ago I understand someone who
really wanted good photos developed them at home in their own darkroom.
As it happens ten minutes ago I left a meeting with Scan Conversion
Services, an SA company with a very large range of scanning, OCRing,
transcription and whatnot capabilities. They can do some pretty
interesting things. I've supervised one specialist job they did and
personally contracted them for a second and am impressed. You can give
them books (which optionally they pull apart and professionally rebind,
losing 1-2mm of page in the process) or slides, 35mm negatives, bitmaps,
posters, bus tickets... and get back images, PDFs, Searchable PDFs,
HTML, HTML+images tastefully arranged, Word (yuk) and other formats.
They have extremely fair prices.
As to security and confidentiality, they are accredited military
contractors which may (or may not) have a bearing on what they know. I
have had no problems with them in this respect anyway.
Scan Conversion appear to have just shot their website, however you can
browse an archived version of it at:
their contact details are at
and I have been dealing with Gavin Chambers, who has been most helpful.
Note: "Searchable PDFs" are not PDFs without some magic flag set ("how
stupid", I said. "Imagine asking for a PDF that wasn't searchable -- of
course I want a Searchable PDF".) I was completely wrong -- a Searchable
PDF is a scan of an original, overlaid over an OCR of the text. So it
looks like a crinkly piece of paper complete with flyspots but you can
search for keywords as per normal. Oops :-)
With respect to OCR, they claim to have written some of their own
interpreting and analysis engines and to run text through several
engines automatically until more than one algorithm agrees on the text.
I can't verify this, but it sounds plausible to me. They also use plenty
of proprietary tools. Greg will be amazed but I've sent them off to look
at SANE! I did give them some analysis parameters though, including how
to evaluate cost of new equipment against funding programmer time to
make existing equipment continue to work with current hardware. This is
often the problem with scanners under Windows or anything else, because
the driver of version N is no longer developed when version N+1 is
produced. It seems scanners tend to progress in internal technology
quicker than some other things such as printers, meaning that different
generations of scanners can have very different drivers, which means
that backwards compatibility in the closed source model comes at a high
price to the manufacturer. (And a coercive lever over customers comes
for free :-)
As I say, in a few days you'll be able to rate their work for yourself.
So Greg, hope that helps!
dan at shearer.org
More information about the linux-aus