March 4, 2008

Going Back to Schools and Their Archives

During my tenure as a director business development for a digital conversion vendor specializing in libraries, I was surprised to learn how great the need and interest was among academic institutions to convert selected parts of their school archives. Of particular interest were yearbooks and student newspapers. There was a strong interest as well in school publications, commencement and sports programmes, course catalogs, department newsletters and journals, and trustee and departmental minutes and reports.

The main difficulty for many of the smaller institutions that I approached was digitization ability or capacity. That is where I came into the picture, surveying the various constraints that they operated under and trying to work out solutions. To that end, I often advised on proper shipping and handling of materials (to ensure that nothing untoward happened to them), recommended specifications for the creation of “archival masters” and display files, advised on file naming requirements, and surveyed project size and scheduling. I became, in some ways, an ad hoc project manager on their behalf.

So what did I learn during this time? Well, consider yearbooks. These are important documents to academic institutions. They are a source of information about alumni that are critical to any school’s institutional history as well as its fundraising efforts. But yearbooks are complicated. Their interior layouts are highly variable, making them visually rich but challenging to search textually. As a result, because of the preponderance of photos, imaging must inevitably be done in grayscale or color, creating large image files. Text capture, however, is a bit more difficult since it will comprise names and all sorts of data that require a high degree of accuracy, raising the price of keying the text. (The trade-off is that yearbooks tend to have relatively little text because they are so graphically rich.)

Or take student newspapers. These are affected by problems of irregular issuance, variable trim sizes over time, and fragility (when not properly cared for). Because of their visual richness, they, too, pose imaging challenges with respect to image quality and file size. On the other hand—again depending on your approach—they tend to be short. So a 12-page PDF of an issue of a student newspaper is likely to be a far smaller file than a 150-page PDF of that same school’s yearbook--making it easier to read online or download.

Course catalogs, minutes and even journals, on the other hand, can be some of the easiest material to digitize because their lack of photographs and color more than likely means that they can be imaged “bitonally” (just black and white, like what you got from older copying machines). This reduces the file size tremendously and allows for far easier display of the content in an online environment. It’s cheaper, too, of course.

The trick to addressing these many variables, as I’ve learned, is having access to a variety of quality digital equipment, a staff who know how to handle library materials as well as image them, and a strong “post-production” process (post-production refers to cleaning up the files—straightening and cropping the images, correcting color, capturing the text by OCR or keying, and, of course, QCing the results afterwards to guarantee the quality). My own experience suggests that libraries that have tried to do the work on the cheap (buy equipment and use their own students or hire vendors who generally neither handle library materials nor understand how they work) pay for it in the long term.