Free Lunch: Open Source Solutions for the Digitisation Workflow

In the world of digital imaging and digitisation the current ubiquity of large proprietary brands can sometimes be seen as a hindrance when considering embarking on a digitisation project. Within digitisation, the issue of digital preservation is a constant reminder of the vulnerability of our digital assets and it is possibly this ubiquity of proprietary software and formats that is seen as being the most contentious issue. Digitisation. Software. Open source. Workflow. Digital preservation.


INTRODUCTION
Digitisation is fast becoming an all-pervasive activity, not only within the cultural heritage sector, but more and more across academia; within libraries, archives and collections, academic institutions are increasingly seeing, not only the importance of engaging in the digitisation process, but conducting such work at the highest possible level.However, in the world of digital imaging and digitisation the current ubiquity of large proprietary brands can sometimes be seen as a hindrance when considering embarking on a digitisation project.Within digitisation, the issue of digital preservation is a constant reminder of the vulnerability of our digital assets and it is possibly this ubiquity of proprietary software and formats that is seen as being the most contentious issue.It has been seen that proprietary formats can have support removed, or the format withdrawn altogether, leaving stakeholders without any clear course of action or redress.It should also be noted that, for some smaller projects, digital imaging and digitisation is also often seen as an expensive undertaking with multiple software licenses costing thousands of pounds, often forcing these smaller projects to hit the buffers before they get started.
Open source software can and should therefore be seen as a legitimate and robust alternative to proprietary formats and software, offering, as they do, ever evolving solutions, supported by the wider stakeholder community.We are all aware of open source software, but quite often only in the areas of need.However, if one looks at the subject of digitisation, it is possible to see that there is a large selection of open source alternatives, which can specifically cater to the needs of each element within the digitisation workflow.
This presentation does not attempt to provide hard and fast answers or solutions to the issue of digitisation and workflow, in terms of software, but to illustrate the feasibility of creating a digitisation workflow using entirely open source software and formats.
At every stage of the digitisation workflow, apart from capture, that requires software or the use of digital file formats, it is possible to use an open source alternative.

Goobi
Goobi is open source software intended to support and manage workflows in digitisation projects for cultural heritage institutions.The software implements international standards such as METS, MODS and other formats maintained by the Library of Congress.
Goobi consists of several independent modules serving different purposes such as controlling the digitisation workflow, enriching descriptive and structural metadata, and presenting the digitised media to the public in a convenient way.

Figure 1: Workflow schema
Goobi was developed by the Gottingen Digitalisierungszentrum but is used increasingly throughout the U.K. throughout the heritage sector.Goobi allows users to model, manage and supervise digitisation production processes including the importing of data from library catalogues, scanning and content-based indexing and the digital presentation and delivery of results in standardised formats.

Lightzone
LightZone is a high-level digital darkroom software package for Windows, Mac OS X, and Linux, with an interface similar to that of its near namesake.It includes RAW processing and editing however, rather than using layers in the way that other photo editors do, LightZone lets the user build up a stack of tools which can be rearranged, readjusted, turned off and on, and removed from the stack.It is a completely non-destructive editor, where any of the tools can be re-adjusted or modified later.A tool stack can even be copied to a batch of photos at one time.LightZone always operates in a 16-bit linear colour space with the wide gamut of ProPhoto RGB.
While many of LightZone's tools are familiar ones, they also have shared, multiple modification possibilities built in that amplify their power and flexibility.LightZone also offers tools for tonal control inspired by the Zone System and HDR tone-mapping.These tools make LightZone especially useful for working with black-and-white imagery.
The combination of the individual tools' inherent flexibility, the flexibility of the Tool Stack, its completely non-destructive editing in a 16-bit wide gamut colour space, and its intuitive GUI, make LightZone a robust alternative for those not entirely comfortable with proprietary software packages.

Gimp
The GIMP is a Free Software application covered by the General Public License.The GPL provides users with the freedom to access and alter the source code that makes up computer programs.GIMP is a multiplatform photo manipulation tool.GIMP is an acronym for GNU Image Manipulation Program.The GIMP is suitable for a variety of image manipulation tasks, including photo retouching, image composition, and image construction.
GIMP has many capabilities.It can be used as a retouching program, an online batch processing system, a mass production image renderer as well as an image format converter.
GIMP is expandable and extensible.It is designed to be augmented with plug-ins and extensions and the advanced scripting interface allows everything from the simplest task to the most complex image manipulation procedures to be easily scripted.
Most GNU/Linux distributions include GIMP as a standard application.GIMP is also available for other operating systems such as Microsoft Windows™ or Apple's Mac OS X™.

Delta.E
By definition, Delta-E (ΔE) is the scientific metric that describes the distance between two colors.The capital "E" stands for Empfindung, the German word for sensation.With the Greek character Delta (Δ), the difference is denoted.So a ΔE describes how your senses relate two colours.
One of the constant issues for digitisation projects is colour management; the rendering and fidelity of colour and image quality within the digital image.Unless the project has a dedicated imaging department, with technicians well versed in the issues surrounding colour management, this issue will always be a stumbling block.In order to improve image quality, it is necessary to know where one stands.It is not advisable to rely on the display, one has to look at the numbers.In an average quality assessment, this runs into a few thousand numbers that describe the image quality.It is then possible to relate them.Delta.E is a web based solution that enables the user to upload a digital capture of a colour reference, such as the Kodak IT-8, and compare it to a set of given references.In this way, we can obtain your image's average ΔE.Delta.E can also asses other issues relating to digital images such as lighting uniformity, geometry, resolution, sharpness, over sharpening, colour registration.

Archivematica
All Archivematica code is released under a GNU Affero General Public License (AGPL 3.0) making it possible to study, modify, improve, and distribute it.Archivematica uses a micro-services design pattern to provide an integrated suite of software tools that allows users to process digital objects from ingest to access in compliance with the ISO-OAIS functional model.Users monitor and control the micro-services via a web-based dashboard.Archivematica uses METS, PREMIS (events, agents, rights and restrictions), Dublin Core, the Library of Congress BagIt specification and other best practice standards and practices.
Archivematica provides several decision points that give the user control over choices about format identification tools, printing the original order of the directories ingested, examining contents for private and personal information, extracting contents of packages and forensic images and transcribing content.Users can also preconfigure most of these options for seamless ingest to archival storage and access.Archivematica allows for various ingest workflows: metadata and submission documentation import, zipped and unzipped Bag ingest, digital forensic image processing, SIP arrangement, manual normalisation, and dataset management.
In the Format Policy Registry (FPR), Archivematica implements its default format policies based on an analysis of the significant characteristics of file formats.The FPR also offers an editable, flexible framework for format identification, package extraction, transcription and normalisation for preservation and access.
Memory institutions have dedicated resources over the past couple of decades to implement various software platforms and tools to manage digital objects.For this reason Archivematica has integrated with them wherever possible.These include: AtoM, DSpace, CONTENTdm, Islandora, LOCKSS, DuraCloud, Arkivum, OpenStack and Archivists' Toolkit.The software applications integrated into Archivematica are each released under their own open source license and these are checked for license compatibility before they are integrated into the project.

Drupal
Drupal is open source software maintained and developed by a community of over 1,000,000 users and developers.It's distributed under the terms of the GNU General Public License, which means anyone is free to download it and share it with others.This open development model means that people are constantly working to make sure Drupal is a cutting-edge platform that supports the latest technologies that the Web has to offer.The Drupal project's principles encourage modularity, standards, collaboration and ease-of-use.
Dries Buytaert began the Drupal software as a message board in 1999.Within a year or so, more people became interested using and contributing to Drupal, so the project was made open source.Drupal.orgcame online in 2001, and the Drupal community gained momentum in 2005 with several code sprints and conferences.
Drupal strives to create a balance between simplicity and flexibility by providing its users with the tools they need to make their own content management solution, while still providing some pre-built components to help them get started.Thus, it can be described both as a content management system (CMS) and a content management framework (CMF); one system that strives to have the strengths of both, without their deficiencies.
Drupal provides a modular system.Developers have already made the building blocks required to create a site that suits your needs, whether that is a news site, an online store, a social network, blog, wiki, or something else altogether.

IIIF
Access to image-based resources is fundamental to research, scholarship and the transmission of cultural knowledge.Digital images are a container for much of the information content in the Web-based delivery of images, books, newspapers, manuscripts, maps, scrolls, single sheet collections, and archival materials.Yet much of the Internet's image-based resources are locked up in silos, with access restricted to bespoke, locally built applications.
A growing community of the world's leading research libraries and image repositories have embarked on an effort to collaboratively produce an open source, interoperable technology and community framework for image delivery which will hopefully, eventually, bring these disparate collections to a wider audience.

IIIF (International Image Interoperability Framework) has the following goals:
1 To give scholars an unprecedented level of uniform and rich access to imagebased resources hosted around the world. 2 To define a set of common application programming interfaces that support interoperability between image repositories.3 To develop, cultivate and document shared technologies, such as image servers and web clients that provide a high level user experience in viewing, comparing and annotating images.

IIPImage
IIPImage is an advanced, open source, highperformance feature-rich image server system for web-based streamed viewing and zooming of high-resolution images.It is designed to be fast and bandwidth-efficient with low processor and memory requirements.The system can comfortably handle gigapixel size images as well as advanced image features such as 8, 16 and 32 bits per channel, CIELAB colorimetric images and scientific imagery such as multispectral images and digital elevation maps.
Streaming is tile-based, making it possible to view, navigate and zoom in real-time around giga-pixel size images that would be impossible to download and manipulate on local machines.It also makes the system very scalable as the number of image tile downloads will remain the same regardless of the size of the source image.
Source images can be in either TIFF or JPEG2000 format.Whole images or regions within images can also be rapidly and dynamically resized and exported by the server from a single source image without the need to store multiple files in various sizes.

JPEG2000
Wavelet compression has been around about for some time, but has only recently been applied to image compression.It is now used within several file formats, but the best known is JPEG2000.Several of the fundamental differences between the common JPEG and JPEG2000 are directly related to the different approaches they take to compression.These include the option of lossless compression in JPEG2000 which is unavailable in JPEG, the smoothness of highly compressed JPEG2000 images compared to the 'blockiness' of JPEG and the additional display functionality, including zooming, offered by JPEG2000.JPEG2000's wavelet compression is superior to the common JPEG's compression because it is able to treat larger areas of the image at once, and in a more discriminating way.The image can be compressed more tightly while at the same time preserving the detail and avoiding any 'blockiness'.
In addition to improving the quality and efficiency of compression, the products of a wavelet transform can be used to enhance the delivery of an image.The decomposition process produces a series of increasingly simplified versions of the image (either smaller or less detailed, depending on how they are encoded).If these are 'played back' in reverse as the image is reconstructed and displayed, the result is a picture that literally grows in size (i.e.resolution) or in detail (fidelity).JPEG2000 offers both of these among its display options.
There is potential for them to be further exploited by software developers to develop zooming and panning facilities.
It is interesting to note, that at the time of writing, a number of larger museums and libraries are turning to JPEG2000 as the digital master file due to its flexibility, lossless compression and smaller file size compared to TIFF or other master formats.As capture technology improves and the resulting file sizes increase, there is an ever present demand for greater repository accommodation.
The issue is therefore to either invest in ever growing server sizes or look to reducing the individual file sizes.

CONCLUSION
Hopefully it is possible to see that these open source software solutions provide a clearly defined, supported and robust alternative to proprietary options, throughout the digitisation workflow.There is however, one area where open source solutions have made little impact; namely capture.With JPEG2000 and Adobe's DNG and TIFF being widely adopted within the digitisation workflow, there is sufficient argument to suggest that open source formats should be adopted by capture technologies.At this point in time camera and scanner manufacturers provide a limited format palette, namely TIFF, JPEG or RAW.Since TIFF and RAW use the imperfect LZW compression, and JPEG uses lossy compression, it would seem intuitive to suggest that, not only should JPEG2000 be adopted as a capture format, but also that an open source RAW format be developed.If this were to happen it would also be the perfect opportunity to consider a RAW format that utilises lossless wavelet compression in the same way as JPEG2000, making it the perfect format for digitisation capture.