Friday, October 19, 2012

Digesting Ingest


Alcohol metabolism
Harkiran Dhindsa, and Rioghnach Ahern, Digital Ingest Officers working on the Wellcome Digital Library, describe their experiences using Goobi on a daily basis, and some of the lessons learnt as we scaled up into full production over this past summer:

Goobi is a workflow-based management system that allows us to track and manage the workflows for various digitisation projects, be that archives, books, film or audio files. Many steps are fully automated, including the conversion of TIFF to JPEG2000 and the ingest of content into our repository Safety Deposit Box.

We found the user interface of Goobi to be intuitive. Training in basic ingest processes was quick. A number of the team are using this system. With regular usage, we were working efficiently and became familiar with the functionality. METS editing is facilitated through a web form which allows JPEG images of individual pages to be viewed. Using such a system eliminates the need to keep separate spread-sheets. Because of the way Goobi tracks the workflow by registering each step, it means different staff can continue with tasks at any open step. At any point if an error is noticed - for example a missing image in a book - a correction message can be sent back along the workflow to the appropriate person.

Goobi produces METS files, which describe objects including their access and license status. Although Goobi writes the METS files, the structure of an object is created manually, depending on the project. Much of our time is spent working on METS editing, particularly in adding restrictions for material which contains sensitive data. Goobi can handle a number of projects at the same time, so we can easily switch between working on archives and books. It can handle different tasks simultaneously. For example, an ingest officer can let an image upload task run for, let’s say, an archive collection, while continuing to edit METS data on books.

Lessons learnt

As daily users of Goobi, here are some of the lessons we have learnt:

Prior to import into Goobi, catalogued items are photographed and then the digitised images are checked for data sensitivity. In the early stages of the project, areas that could be improved for a more accurate and efficient workflow became obvious. Amongst one of the first archival collections to be digitised, some of the images that were available as a backlog, and were due to be uploaded into Goobi, did not reflect the archive catalogue (CALM). This was because changes had been made to the catalogue after photography. The lesson learnt from this experience is that photography should only be carried out after cataloguing has been truly completed so that the arrangement of material is firmly established.

To upload images into Goobi, they are first copied from a working network directory to a temporary drive created by Goobi for the user who has accepted the upload task. This process can be terminated by other activities if the network to the local PC is running at full capacity. When this happens we have to redo the transfer, taking extra time to complete the task. Thankfully, this image upload task will be automated in the future, bypassing the local PCs completely. However, the running of several tasks simultaneously will still be limited by server capacity when uploading large files.

After METS editing was completed on one of the archive collections, we were given further sensitivity data. To add these new sensitivity restrictions, we had to “roll back” processes that had already been ingested, thereby re-running part of the workflow. It is very easy to prompt a second ingest into the digital asset management system in the process, resulting in duplicated sets of files, as the roll-back process is less intuitive and not intended for regular use. Again, we have learned an important lesson. It will always be necessary to edit METS files. Changes to the workflow steps in Goobi to make this more straightforward would be useful, but it would be even better to finalise sensitivity lists before METS editing is completed in order to minimise duplication of effort.

A workflow system such as Goobi becomes imperative when ingesting mass collections of archives and books. Images that have gone through the complete ingestion process in Goobi will be accessed online via the Player. Seeing the images in an attractive interface is a satisfactory part of this work as this is where all of the different tasks come to fruition: the digitised archives and books available to the public to view in a user-friendly form — soon to be publicly available!

Authors: Harkiran Dhindsa and Rioghnach Ahern