Skip to Content

How do I avoid file format obsolescence?

Take action

  • Determine whether or not the data file format is suitable for long-term preservation
  • Identify the tools and resources required to convert data file formats for preservation purposes while maintaining the integrity of the data themselves

 

Explore

The goal of digital curation is to ensure the appropriate usability of managed digital assets over time. Format is a fundamental characteristic of a digital asset that governs its ability to be used effectively...This instalment investigates aspects of format description, validation, and characterisation that may assist with long-term curation and usability of data.

This webpage provides an overview of the importance of data formats for data preservation.

  • Florida Center for Library Automation (FCLA). “Recommended Data Formats for Preservation Purposes in the Florida Digital Archive.” Florida Center for Library Automation (FCLA), March 2012. http://fclaweb.fcla.edu/uploads/recFormats.pdf.

This table is intended to help Florida university administrators develop guidelines for preparing and submitting files to the Florida Digital Archive. It is not intended to suggest that these file formats are allowable formats for ETDs or any other official publication of any Florida university.

The format and software in which research data are created usually depend on how researchers choose to collect and analyse data, often determined by discipline-specific standards and customs. Ensuring long-term usability of data requires consideration of the most appropriate software and file formats. The UK Data Archive provides guidance on file formats and software for data management.

 

Read

Recent work in the semantics of markup languages may offer a way to achieve more reliable results for format conversion, or at least a way to state the goal more explicitly.  In the work discussed, the meaning of markup in a document is taken as the set of things accepted as true because of the markup’s presence, or equivalently, as the set of inferences licensed by the markup in the document.  It is possible, in principle, to apply a general semantic description of a markup vocabulary to documents encoded using that vocabulary and to generate a set of inferences (typically rather large, but finite) as a result.  An ideal format conversion translating a digital object from one vocabulary to another, then, can be characterized as one which neither adds nor drops any licensed inferences; it is possible to check this equivalence explicitly for a given conversion of a digital object, and possible in principle (although probably beyond current capabilities in practice) to prove that a given transformation will, if given valid and semantically correct input, always produce output that is semantically equivalent to its input.  This approach is directly applicable to the XML formats frequently used for scientific and other data, but it is also easily generalized from SGML/XML-based markup languages to digital formats in general; at a high level, it is equally applicable to document markup, to database exchanges, and to ad hoc formats for high-volume scientific data.  Some obvious complications and technical difficulties arising from this approach are discussed, as are some important implications.

The CRiB is a Service Oriented Architecture (SOA) designed to assist cultural heritage institutions in the implementation of migration-based preservation interventions.  The CRiB system works by assessing the quality of distinct conversion applications or services to produce recommendations of optimal migration strategies.  The recommendations produced by the system take into account the specific preservation requirements of each client institution.

The rash of new information technology has raised major problems for librarians, faced with a flood of new kinds of media, such as audio and video tape or computer disks.  Most of these technologies were designed and manufactured without permanence as a prime consideration.  A worse problem is the short life of the reading devices; even the less than permanent tapes or disks often do not deteriorate until long after the machines to read them become unavailable.  Digital preservation thus depends upon copying, not on the survival of the physical media.  Librarians must prepare for reformatting as a regular step in information management. In this, they must join with computer center managers, individual computer owners, and all others who need to keep machine-readable data for more than a few years.

Section on Data Conversion (page 13) summarises best practice.

Groups:


about seo | group_wiki_page