Skip to Content

Digitizing - File Formats

Q. What file formats should I use?

Before you digitize your physical items, you will need to take some time to determine what digital file format you will safe your output in.  Many institutions save each item in both a preservation and access file format.  The access format is often smaller in size to make it quicker to download or send to the user, and it can be in file formats that are less sustainable.  Preservation file formats should always be long-lasting and are often much bigger than access formats.  The idea is to save the item in a format that can be used far in the future to create new access versions or to migrate to newer and more sustainable file formats as technology changes. c

 

Take action

  • Determine what file formats you will use for your digitized collection

 

Review use cases

  • Arms, Carolyn & Karl Fleischhauer. Sustainability of Digital Formats: Planning for the Library of Congress.
The Digital Formats Web site provides information about digital content formats. The analyses and resources presented here will increase and be updated over time.
At the end of the first section, there are the FCLA preservation ratings for oral history media formats.
This web page contains format considerations and recommendations for creating digital content suited for long-term preservation and use. This information was compiled for users of the Harvard DRS but could be applied more generally to any digital content intended for long-term preservation."

 

Read

This document is one of a series of guidance notes produced by The National Archives, giving general advice on issues relating to the preservation and management of electronic records. It is intended for use by anyone involved in the creation of electronic records that may need to be preserved over the long-term, as well as by those responsible for preservation.This guidance note provides information for the creators and managers of electronic records about file format selection.
"Many digital file formats can be considered for preservation. CENDI agencies, however, are most
concerned with formats that best preserve text documents such as technical reports and journal
articles. For this reason the report focuses on four major formats in the context of document
preservation – TIFF, PDF, PDF/A, and XML."
Describes the quantifiable file format risk assessment method, which can be used to define digital preservation strategies for specific file formats, and intends to inspire other cultural heritage institutions to define their own quantifiable file format evaluation method.
  • Rosenthal, David.  "Format Obsolescence: Assessing the Threat and the Defenses."  Library Hi-Tech 28 no. 2 (2010): 195-210.  http://dx.doi.org/10.1108/07378831011047613 (subscription required to access this resource)
Aims to examine the approach to format obsolescence, preparing for format migration, that has guided most digital preservation work for the last 15 years; makes the case that the commonly accepted approach to digital preservation devotes resources to activities that are unlikely to be effective.
Ed Pinsent wrote a blog in response to the Malcolm Todd report.  Rusbridge responded on December 7, 2009, and raised questions about lossy migration; Ashley responded on December 8, 2009, and added some further information about the seminar that led to the Technology Watch Report.
  • Thompson, Dave.  "A Pragmatic Approach to Preferred File Formats for Acquisition."  Ariadne 63 (April 2010).  http://www.ariadne.ac.uk/issue63/thompson (subscription required to access this resource)
Sets out the Wellcome Library's decision not explicitly to specify preferred file formats for long-term preservation and discusses a pragmatic approach in which technical appraisal of the material is used to assess the Library's likelihood of preserving one format over another.
Considers various criteria for file formats: adoption, technological dependencies, disclosure, transparency, metadata support, reusability/interoperability, robustness/complexity, stability, intellectual property/digital rights production, the ability of formats to convey content information, extent of format, and cos


Last updated on 08/27/13, 12:15 pm by callee

 

Groups:


about seo | group_wiki_page