Skip to content

Equipment in our Lab and Book digitziation

by on June 25, 2012

A few years ago, I did a presentation with Cynthia Henry to GWLA about the equipment we have in our digital lab.  Cynthia made the videos availble on her blog here:

The presentation was focused on book digitization, so we only show our book scanning equipment.  Here is the rundown on the presentation:

Book projects- How are they different

If you’ve never digitized a book before, you may not realize how different they are from other digitzation projects.  They are more complex becuase there are more factors involved.  Instead of just worrying about a flat image you now have to worry about the curvature of the book when it’s opened which brings in a whole other dimention to the digitzation process.  This is complicated  by the fact that people have started to become familyer with Ebooks and born digital content, so they expect a digitized book to be just like a born digital Ebook.  You may even feel like it needs to be this way, but it is an incredible drain on resources to process a book to perfection.  The trick here is knowing what you are willing to do and what you aren’t.

Another thing that makes books different is the sheer number of images per item.  One book can have anywhere from 200 to 600 pages, which means 200-600 images that then have to be stored and processed.  Batch processing becomes very important.

—Because of the curvature of the book, book digitization can be less precice than other digitization projects.  There might be no way to salvage the text from the center of a book, or no way to mitigate the curvature of the edge of the page.  If your a perfectionist, a book digitization project might drive you mad, especialy if the books are old with odd bindings and brittle pages so you can’t just bend the book flat.
Add on top of that Optical Character Recognition (OCR).  It’s the process that turns an image of a book into searchable text.  OCR is a complicated issue.  There are many factors that affect the quality of OCR and you can read about them in D-Lib Magazine.  Here are just a few good articles.
The point is, once you start adding in all these factors, your dealing with a process that is very different from other forms of digitization.  I’ve often said that scanning a book has more incommon with photographing objects than it does flat media.  I will address these points in detail in the book.
If anyone has any horror stories or lessons learned, please feel free to mention it in the comments.
To get the discussion started, I’ll tell you a horror story of one of my first book digitization projects:
We had gotten use to scanning theses and dissertations (easy book scanning), and we took on a yearbook project.  The project completely brok our system.  Our computers weren’t powerful enough to process full color images so our programs kept crashing.  We belew out the motor on our automatic book scanner trying to have it pick up the slick magazine pages.  We ended up processing the pages in grayscale and doing as little processing as possible.  We ended up scanning the entire colleciton by hand twice becuase we ended up redoing the whole project. This was 86 books that took months to finish, compared to our theses and dissertations that we could finish in the thousands every month.  Even amung book projects, there are vast differences.

From → Uncategorized

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: