Skip to Content

Storage Media - Identify

Q. What should I do to identify the materials with which I am working?

As part of the process of pulling digital information from physical media, you'll not only have to accurately identify what storage media you are working with, but also the digital file formats that are contained on the media and the risks associated with these file formats.

Take action

  • Identify the storage media you have and what content you may be acquiring
  • Identify the formats of files

Explore Tools

  • Disk Inventory X. http://www.derlien.com/
    Disk Inventory X identifies and visualizes the sizes of files and folders. It runs in Mac OS X 10.3 (and later). Note: You must be able to mount the drive in order to run Disk Inventory X on it; it doesn't work for disk images or disks that have filesystems that your computer's operating system can't recognize.
  • Disk Usage Analyzer. http://www.marzocca.net/linux/baobab/
    Disk Usage Analyzer is graphical disk usage analyzer that can run in GNOME on Linux/Unix. It can be run against an entire volume or individual directories.
  • DROID http://sourceforge.net/projects/droid/
    DROID (Digital Record Object Identification) is an automatic file format identification tool. It is the first in a planned series of tools developed by The National Archives under the umbrella of its PRONOM technical registry service.
  • FIDO (Format Identification for Digital Objects (FIDO). https://github.com/openplanets/fido
    FIDO is a command-line tool to identify the file formats of digital objects.
  • file (Unix). http://en.wikipedia.org/wiki/File_%28command%29
    file is a command-line utility for identifying file types. It was initially introduced in the 1970s, and it is included with every major distribution of Unix/Linux.
  • Global Digital Format Registry http://www.gdfr.info/
    The GDFR is meant to be a distributed and replicated registry of format information populated and vetted by experts and enthusiasts world-wide.
  • Grand Perspective. http://grandperspectiv.sourceforge.net/
    Disk Inventory X identifies and visualizes the sizes of files and folders. It runs in Mac OS X. Note: You must be able to mount the drive in order to run Grand Perspective on it; it doesn't work for disk images or disks that have filesystems that your computer's operating system can't recognize.
  • JHOVE (JSTOR/Harvard Object Verification Environment. http://sourceforge.net/projects/jhove/
    JHOVE2 is open source software for format-specific identification, validation, and characterization of digital objects. See also JHOVE2
  • KDirStat. http://kdirstat.sourceforge.net/
    KDirStat is a graphical disk usage utility that runs in KDE on Linux/Unix.
  • MagicDisc - Magic ISO. http://www.magiciso.com/tutorials/miso-magicdisc-overview.htm
    MagicDisc is a free utility "designed for creating and managing virtual CD drives and CD/DVD discs." It can be used to mount disk images as drives in a Windows environment.
  • OSFMount - PassMark Software. http://www.osforensics.com/tools/mount-disk-images.html
    OSFMount is a free utility for mounting disk images (dd and .iso) in Windows.
  • PRONOM: The Technical Registry http://www.nationalarchives.gov.uk/PRONOM/Default.aspx
    PRONOM is a resource for anyone requiring impartial and definitive information about the file formats, software products and other technical components required to support long-term access to electronic records and other digital objects of cultural, historical or business value.
  • TreeSize Professional - JAM Software. http://www.jam-software.com/treesize/
    FreeSize can be used to identify and visualize the contents of a drive. It also supports basic analysis, export, reporting and duplicate identification.  Note: You must be able to mount the drive in order to run Treesize on it; it doesn't work for disk images or disks that have filesystems that Treesize or your computer's operating system can't recognize.
  • WinDirStat. http://windirstat.info/
    WinDirStat identifies and visualizes the sizes of files and folders. It runs in Windows 95 or later. Note: You must be able to mount the drive in order to run WinDirStat on it; it doesn't work for disk images or disks that have filesystems that your computer's operating system can't recognize.

Watch

  • Underwood, William.  "File Format Identification Technologies."  YouTube, 39:36, from a presentation on June 25, 2010.  Posted by usnationalarchives.  October 13, 2010.  http://www.youtube.com/watch?v=dVMs5YnZ0HU

 

Gives a progress report on a promising technology for File Format Identification for use in NARA archival processes. Dr. Underwood is at the Georgia Tech Research Institute (GTRI).

Read

  • Arms, Caroline R., Carl Fleischhauer, and Jimi Jones. "Sustainability of Digital Formats: Planning for Library of Congress Collections," last updated December 12, 2011. http://www.digitalpreservation.gov/formats/
    The Digital Formats Web site provides information about digital content formats. The analyses and resources presented here will increase and be updated over time.
  • Born, Günter. The File Formats Handbook. London: International Thomson Computer Press, 1995.
    Born documents a variety of file formats that archivists and librarians are likely to find in collections of materials from the 1980s and 1990s. Of particular note are numerous figures that include hex dumps (hexadecimal representation of the contents) of files in particular formats.
  • Kessler, Gary C. "File Signatures Table." http://www.garykessler.net/library/file_sigs.html
    This document contains more than 400 file signatures (aka "magic numbers") that can be used to identify specific file types. Several of the tools listed above make use of magic numbers, and you can also view magic numbers by opening a file in a hex editor.
  • Lechich, Roy.  "File Format Identification and Validation Tools."  Yale University: Integrated Library & Technology Systems, February 2007. http://www.library.yale.edu/iac/DPC/FileIDandValidate.pdf
    Provides a brief overview of file type identification and file format validation. Analyzes some of the currently available open source academic tools related to file format.
  • Mediapedia. National Library of Australia. http://mediapedia.nla.gov.au/home.php
    Mediapedia "is intended to enable the identification of various physical media carrier types for assisting with collection planning, assessment, documentation, infrastructure and preservation planning for the content they hold. These could include media across various genres such as cine, video, photo, audio, data, paper carriers, microfilm, etc."
  • Pearson, David and Colin Webb.  "Defining File Format Obsolescence: A risky journey."  International Journal of Digital Curation 3, no. 1 (2008): 89-106.  http://www.ijdc.net/index.php/ijdc/article/download/76/44 -- link opens a PDF file
    This paper reports on the AONS (Automatic Obsolescence Notification System) II Project, which aimed to refine and develop a software tool that would automatically find and report indicators of obsolescence risks, to help repository managers decide if preservation action is needed.
  • Todd, Malcolm.  "Technology Watch Report: File formats for preservation."  Digital Preservation Coalition, October 2009.  http://www.dpconline.org/component/docman/doc_download/375-file-formats-for-preservation -- link opens a PDF file
    Considers various criteria for file formats: adoption, technological dependencies, disclosure, transparency, metadata support, reusability/interoperability, robustness/complexity, stability, intellectual property/digital rights production, the ability of formats to convey content information, extent of format, and cost.

 

Last updated on 08/26/13, 8:24 pm by callee

 

Groups:


about seo | group_wiki_page