IMLS Awards $250,000 to the Digital Public Library of America for Digital Hubs Pilot Program

A good sign of good things. The Digital Public LIbrary of America has recieved 250k from IMLS.

Efficiency is bad for digital projects

With all the use of machines and the focus on numbers, it can be easy to talk about efficiency when it comes to digitization. The desire is to tweak the process so that it goes just a little faster. This is true especially of large projects where a 30 second difference per image can shave weeks or months off the time for a project. However, efficiency is a double edged sword. It allows you to get faster, but it limits your ability to handle variety. In the book Slack: Getting past burnout and the myth of total efficiency, the author talks about how the more efficient you are, the less agile you can be. As efficiency goes up, your ability to handle complexity goes down. Something big happens, and suddenly you can’t change easily because your too efficient. I experienced this first hand in our digital lab. We had our process down perfectly. The moment someone gave us some items that didn’t quite fit in, everything fell apart and I had to quickly re-train everyone to handle the new variety of items.

In addition to that, there’s the disruption of all these small efficiency changes themselves. I worked on a very large project that people were eager to have done quickly. They kept asking to speed the project up and I did my best to tweak the process. Then I realized that we had about 8000 items to go, and we were processing 3000 a month. At that point, I stopped trying to make the process more efficient because I realized that the disruption my changes would make would actually slow the process down.

So, be cautious about those feelings to want to make the process more and more efficient.

Dealing with Digitization Boredom

With all the fancy technology, and expensive equipment, and interesting materials, from the outside looking in Digitization seems very exciting. We have people come in our lab all the time and Ooo and Ahhh over our scanners and our computers, but the truth of it is that the work is repetitive and consistent and utterly boring.
Our student assistants, at most last two years before they can’t take the mindless scanning and editing. At worst, we loose them after a few days because they realize right off that this isn’t the job for them.
Digitization Burn Out is a real thing. The hard thing about it is that there’s not a good method for dealing with it. If you have to digitize a mountain of stuff, there’s no way to make that more interesting. I have a few suggestions:

If your the person doing the digitization:

  • Take breaks. Get up and move around every hour at least. Get your eyes off the scanner, off the screen. Think about something else for five minutes.
  • Have a full and active life outside of the digitization. The best defense against boredom is contrasting it with a life well lived. Physical activity can help alleviate the stress of sitting at a scanner or computer for hours in a dark room.
  • Get lots of sunlight or take Vitamin D. When you work in a cave-like place, this is important.
  • Turn it into a game or challenge your mind by experimenting with different ways to make the work go by faster.

If your the person managing people who are doing the digitization:

I’m going to be honest. Most people are fine with sitting down and working at a computer for hours. The problem comes when SOMETHING else is going on in their life, and the boredom of their work makes it worse, and it pushes them over the edge. Maybe they’re having family problems, or relationship problems, and having hours a day to sit and do mindless work just gives them time to stew on their problems. The point is, they start changing. Some jobs can make people feel better when they get stressed, but digitization is a lonely dark monotonous job that provides little comfort if someone is already not feeling well.

  • Be aware of signs of stress. Are they missing work? Are they gaining or losing a lot of weight?
  • Has the quality or quantity of their work changed?
  • Are they over-reacting to being corrected?

These might be signs that they need to change tasks, if that’s possible. In our lab, we offer students to work in other departments for a time to give them something different. We also recognize that at some point, nothing is going to make the job better for them, and we recommend that they find a job that aligns with their interests.

Scanners are great, but people often forget that someone has to be there to run the scanner, and people aren’t machines.


Digital Preservation- TRAC certification And the Digital Preservation Network

I touched on this in the comments of a previous post, but I am concerned that people in general are reacting to the DPN the wrong way. The initial push for the Digital Preservation Network was full of discussions of what could be, and why it was important. I think, for the most part, everyone agreed. In conversations with my coworkers, when I try to talk about digital preservation, they say “But I thought DPN was going to take care of all that?” The answer is that may DPN will take care of “all that”, but it will be a number of years before that happens. My job is to make sure that when DPN happens, we still have good data to give them.

How do you do digital preservation? It may seem like such a big topic, it’s impossible to figure it all out, but really there are a few key things that you can do to make sure your data is safe:

  1. Have multiple copies in geographically distant locations, even if that means it’s just in another building.
  2. Check the data for integrity (easily done through creating checksums, and then checking those checksums at regular intervals).
  3. Keep things in current file formats. Simple formats are best. Lossless formats are best.
  4. Have some overall organization of the digital archive so that it is easy to find things and where they belong.
  5. Have clear policies about what goes in, how it goes in, how it’s checked, and how it goes out.

If you want more information about how to do Digital Preservation right, take a look at the Trusted Digital Repository Certification requirements. You might not be able to include everything they suggest (since getting TRAC/TDR certified is actually really difficult), but just being aware of what is required for a good digital archive can do wonders.


Minimum Digitization Capture Recommendations

A wonderful document has been released. The Minimum Digitization Capture Requirements from ALA.  What is linked to is the draft form, but it already contains very valuable standards.  It is very detailed, and has solid examples.

One of the things I really love about this document is the acceptance that there are different reasons to scan thing, and that those different reasons have different results, different processes, and different challenges.

A quote from page 11:

“Digitizing for informational content is somewhat different than creating a surrogate for artifactual reasons where granules are clearly seen. Capturing the image such that the important elements are represented is usually adequate when one is concerned with the informational content, though it should be noted that photographs are commonly enlarged or magnified to view smaller elements clearly. It may be necessary to capture individual light sensitive granules when specific information on the original photographic process is important.”

Also, they talk about a variety of different kinds of objects, and the unique problems associated with them, and the specific recommendations for those types of objects.

Mind you, the report is talking about “minimum” requirements, which I think is a great way to approach this.  You can do more if you can, but no matter what you do, you should try to do at least this much.  Those are realistic guidelines.

I highly recommend taking a look.

What does it mean for something to be “Online”?

What does it mean for something to be “online”?  If the item is put online in a site that is regularly crawled by search engines, then it is accessible for most people if the words used in or about the item are accurate and match up with the terms the users would actually use to find that item for a wide range of needs.  So, even if Google crawls your site, how accessible your stuff is depends on how good your metadata is.

There is a term for how your site behaves regarding search results and the efforts to affect how it behaves. It’s called Search Engine Optimization (SEO).  Google has a great guide for how to make your content and your pages effective on their Webmaster Tools:Search Engine Optimization page.

Once Google crawls your site, they often take a copy of everything they can touch.  They do this in case your site goes down and people still need to have access to the material.

There are other people and groups that crawl the internet besides search engines.  Many countries have a strong internet crawling effort geared toward gathering free resources.

If we were going to describe this in terms of traditional publishing, the item is “Super” published.  Not only is it available in many different sites, and there are multiple copies, but it’s actually fairly easy to show that the item you posted was the original.

This super availability scares some people.

It also means, once something has been online it’s difficult to have the cached copy removed from Google, but there is a method.  If you are the site administrator, you can go to Google’s Webmaster Tools and request the page taken down.  If you are not the webmaster, you can make your case at the “Removing Content From Google” page.

On the flip side, it is possible to put an item up on the internet and have it effectively lost, and only accessible to people who already know where to find it.  Search Engines doen’t crawl everything, and in some content management systems used in digital libraries (CONTENTdm for example), the structure of the system makes the system very difficult to crawl.  Even in systems that are easy to crawl, they may still be hard to find.  The lesson here is that just putting something “online” isn’t good enough to get it out to the world.  You have to hook in your items with other resources.  For example, we have a collection of Ships that was underutilized.  We connected the collection to, and the collection is now one of our most used collections.  Not only that, it’s now well indexed by Search Engines because of the heavy use, and because we provided links to the items externally (bypassing the structure of CONTENTdm that makes crawling difficult).

So, if your items are online but not getting much use, try going through the SEO information from Google, and start connecting your resources to other sites.

The Dangers of Momentum

I have been following the excitement that has developed over the Digital Public Library of America.  I’ve also been following the momentum for the Digital Preservation Network. I am excited about the sudden interest and excitement in both of these initiatives. I am also cautious of the developing momentum.

Momentum can be great for projects.  People get excited, and people work harder. I have personal experience in how momentum can go wrong.

I was involved in a digital initiative before Google announced their book project, so pre October 2004. Our library had decided to digitize books on a massive scale and we were gearing up to do it just as Google announced their plans. Since no one really knew what the Google book project really meant, we went forward with our projects. We developed a mentality that we were there to scan hundreds of thousands of books. There were, total, 16 people involved in the planning and in the maintenance of the digital initiatives, so change was difficult. Even when it became obvious that Google was going to scan the things we were planning on scanning and that they were going to scan it faster and better than we could, we marched on because we had already developed momentum toward mass digitization of books. We convinced ourselves that even if we were scanning the same books as Google, we would make them different because of the high quality of our scans.  This mentally lasted from 2004 all the way to 2010 (and a little beyond that).

We had gained momentum in the wrong direction. It was as if we were driving toward a cliff faster and faster, and when we finally saw the cliff, we couldn’t easily turn one way or another because there were too many people involved. We were so big that we had no agility to change. It took going from 16 people (with 6 leads) to a new unit with a single lead, and three people following (4 people total) in order to change the direction of the momentum we had going.

So, our new direction is not book based. There is a book element, but we are only scanning books Google can’t do. Our focus is  scanning the rare and unique materials available to us and our community that Google could never scan. This is the true benefit of individual scanning efforts by libraries and archives. They are often the only people who have access to, and who care about some of these materials and who have the resources to make them visible to the world.

The lesson I took from all of this was that the more people are invested in something, the harder it becomes for that thing to change when circumstances change.  I doubt the Digital Public Library of America and the Digital Preservation Network will go the wrong direction, but it’s worth it to keep the dangers of momentum in mind while they build something as important as a national digital library.