Skip to Content

Storage Media - Prepare

Q. How should I prepare to acquire information from storage media?

Before you begin extracting information from physical media, it will be helpful to assess your current situation. Take a look at what physical media you already have and what media you are considering to add to your holdings. Make sure you understand the processes of acquiring information from physical media – including the human resources, technology, and costs. It is also be a good idea to understand how information is saved and accessed in computer systems.  Information professionals can learn a lot from the field of digital forensics, which has established methods and tools for making trustworthy and complete copies of information, and then extracting relevant data and metadata.

Take action

  • Review use cases, watch videos, and read literature to gain a greater understanding of computing principles
  • Perform needs and resource assessments
  • Establish the monetary, human, and technological resources you need and what you have available
  • Prepare clearly defined policies for processes

Explore Tools

  • BitCurator. http://bitcurator.net
    The BitCurator project is developing and disseminating a suite of open source tools to integrate digital forensics tools and techniques into archival and library workflows. The tools are being developed and tested in a Linux environment, but the software on which they depend can be compiled for Windows environments (and in most cases are currently distributed as both source code and Windows binaries). There are two primary paths to implement the software: 
    • As a ready-to-run Linux (Ubuntu) environment that can be used either as a virtual machine or installed as a host operating system. This environment is customized to provide users with graphic user interface (GUI)-based scripts that provide simplified access to common functions associated with handling media, including facilities to prevent inadvertent write-enabled mounting (software write-blocking).
    • As a set of individual software tools, packages, support scripts, and documentation to reproduce full or partial functionality of the ready-to-run BitCurator environment.
  • Digital Forensics Tools http://digitalcurationexchange.org/node/2038
    This is a list of digital forensics tools from the appendices of the paper "Digital Forensics and Born-Digital Content in Cultural Heritage Collections" by Matthew G. Kirschenbaum, Richard Ovenden, and Gabriela Redwine, with research assistance from Rachel Donahue.
  • EnCase - Guidance Software. http://www.guidancesoftware.com/encase-forensic.htm
    EnCase is a commercial digital forensics package with a variety of features.
  • Forensic Tool Kit (FTK) - AccessData. http://www.accessdata.com/products/digital-forensics/ftk
    FTK is a commercial digital forensics package with a variety of features.
  • ILooKIX - Perlusto. http://www.perlustro.com/solutions/e-forensics/ilookix
    ILooKIX is a commercial digital forensics package with a variety of features.
  • OSForensics. http://www.osforensics.com/osforensics.html
    OSForensics supports "hash matching, drive signature comparisons, e-mails, memory and binary data. It lets you extract forensic evidence from computers quickly with advanced file searching and indexing and enables this data to be managed effectively."
  • The Sleuth Kit (TSK) - Basis Technology. http://www.sleuthkit.org/
    TSK is an open-source suite of digital forensics tools that "can be used to analyze disk images and perform in-depth analysis of file systems (such as NTFS, FAT, HFS+, Ext3, and UFS) and several volume system types." It can be run on Windows, Linux/Unix, and Mac OS X.
  • Tools - Forensics Wiki. http://www.forensicswiki.org/wiki/Category:Tools
    The Forensics Wiki includes information about a variety of tools that can be used to acquire, manage and analyze data from storage media. It was created by Simson Garfinkel.

Watch

  • Shaw, Seth.  "Preparing Storage Media: Inventory."  YouTube, 1:18, July 2, 2012.  Posted by CDCGUNC.  February 15, 2013.  http://www.youtube.com/watch?v=InI8841pC0k

    Seth Shaw is the Electronic Records Archivist at Duke University. He provided advice about the importance of conducting a thorough inventory of acquired storage media.
  • Shaw, Seth.  "Preparing Storage Media: Challenges."  YouTube, 2:00, July 2, 2012.  Posted by CDCGUNC.  February 15, 2013.  http://www.youtube.com/watch?v=8AzZaj-Fe-w

    Seth Shaw is the Electronic Records Archivist at Duke University. He described some of the challenges of handling acquired storage media.

Read

  • AIMS Working Group. "AIMS Born-Digital Collections: An Inter-Institutional Model for Stewardship." 2012. http://www2.lib.virginia.edu/aims/whitepaper/
    "The AIMS project evolved around a common need among the project partners — and most libraries and archives — to identify a methodology or continuous framework for stewarding born-digital archival materials." "The AIMS Framework was developed to define good practice in terms of archival tasks and objectives necessary for success. The Framework, as defined in the White Paper found below, presents a practical approach but also a recognition that there is no single solution for many of the issues that institutions face when dealing with born-digital collections. Instead, the AIMS project partners developed this framework as a further step towards best practice for the profession."
  • Beek, Christiaan. "Introduction to File Carving." McAfee. 2011. http://www.mcafee.com/us/resources/white-papers/foundstone/wp-intro-to-file-carving.pdf
    "'File carving,' or sometimes simply 'carving,' is the process of extracting a collection of data from a larger data set. Data carving techniques frequently occur during a digital investigation when the unallocated file system space is analyzed to extract files. The files are 'carved' from the unallocated space using file type-specific header and footer values. File system structures are not used during the process. File carving is a powerful technique for recovering files and fragments of files when directory entries are corrupt or missing. The block of data is searched block by block for residual data matching the file type-specific header and footer values."
  • Carrier, Brian. "Computer Foundations." In File System Forensic Analysis, 17-45. Boston, MA: Addison-Wesley, 2005. [See also "Hard Disk Data Acquisition" (47-66).]
    "The goal of this chapter is to cover the low-level basics of how computers operate. In the following chapters of this book, we examine, in detail, how data are stored, and this chapter provides background information for those who do not have programming or operating system design experience. This chapter starts with a discussion about data and how they are organized on disk. We discuss binary versus hexadecimal values and little- and big-endian ordering. Next, we examine the boot process and code required to start a computer. Lastly, we examine hard disks and discuss their geometry, ATA commands, host protected areas, and SCSI."
  • Farmer, Dan and Wietse Venema.  "File System Basics."  In Forensic Discovery.  Upper Saddle River, NJ: Addison-Wesley, 2005.  http://www.porcupine.org/forensics/forensic-discovery/chapter3.html
    "In this chapter we will explore some fundamental properties of file systems. As the primary storage component of a computer the file system can be the source of a great deal of forensic information. We'll start with the basic organization of file systems and directories, including how they may be mounted on top of each other to hide information. We'll then move onto various types of files along with their limits and peculiarities, as well as the basic inode and data block relationship. Next we outline the lowest levels of the file system - partitions, zones, inode and data bitmaps, and the superblock. Along the way we'll discuss and introduce a variety of tools and methods to facilitate our exploration and analysis."
  • Garfinkel, Simson, and David Cox. "Finding and Archiving the Internet Footprint." Paper presented at the First Digital Lives Research Conference: Personal Digital Archives for the 21st Century, London, UK, February 9-11, 2009. http://simson.net/clips/academic/2009.BL.InternetFootprint.pdf
    "With the move to “cloud” computing, archivists face the increasingly difficult task of finding and preserving the works of an originator so that they may be readily used by future historians. This paper explores the range of information that an originator may have left on computers “out there on the Internet,” including works that are publicly identified with the originator; information that may have been stored using a pseudonym; anonymous blog postings; and private information stored on web-based services like Yahoo Calendar and Google Docs. Approaches are given for finding the content, including interviews, forensic analysis of the originator’s computer equipment, and social network analysis. We conclude with a brief discussion of legal and ethical issues."
  • Hillis, W. Daniel. The Pattern on the Stone: The Simple Ideas That Make Computers Work. 1st ed. New York: Basic Books, 1998. [Nuts and Bolts (1-19); Universal Building Blocks (21-38)]
    "Most people are baffled by how computers work and assume that they will never understand them. What they don’t realize—and what Daniel Hillis’s short book brilliantly demonstrates—is that computers’ seemingly complex operations can be broken down into a few simple parts that perform the same simple procedures over and over again. Computer wizard Hillis offers an easy-to-follow explanation of how data is processed that makes the operations of a computer seem as straightforward as those of a bicycle.Avoiding technobabble or discussions of advanced hardware, the lucid explanations and colorful anecdotes in The Pattern on the Stone go straight to the heart of what computers really do. Hillis proceeds from an outline of basic logic to clear descriptions of programming languages, algorithms, and memory. He then takes readers in simple steps up to the most exciting developments in computing today—quantum computing, parallel computing, neural networks, and self-organizing systems.Written clearly and succinctly by one of the world’s leading computer scientists, The Pattern on the Stone is an indispensable guide to understanding the workings of that most ubiquitous and important of machines: the computer."
  • John, Jeremy Leighton. "Adapting Existing Technologies for Digitally Archiving Personal Lives: Digital Forensics, Ancestral Computing, and Evolutionary Perspectives and Tools." Paper presented at iPRES 2008: The Fifth International Conference on Preservation of Digital Objects, London, UK, September 29-30, 2008. http://www.bl.uk/ipres2008/presentations_day1/09_John.pdf
    "The adoption of existing technologies for digital curation, most especially digital capture, is outlined in the context of personal digital archives and the Digital Manuscripts Project at the British Library. Technologies derived from computer forensics, data conversion and classic computing, and evolutionary computing are considered. The practical imperative of moving information to modern and fresh media as soon as possible is highlighted, as is the need to retain the potential for researchers of the future to experience the original look and feel of personal digital objects. The importance of not relying on any single technology is also emphasised."
  • John, Jeremy Leighton. "Digital Forensics and Preservation." DPC Technology Watch Report 12-03.  Digital Preservation Coalition, November 2012  http://www.dpconline.org/component/docman/doc_download/810-dpctw12-03pdf
    "In recent years, digital forensics has emerged as an essential source of tools and approaches for facilitating digital preservation and curation, specifically for protecting and investigating evidence from the past. Institutional repositories and professionals with responsibilities for personal archives can benefit from forensics in addressing digital authenticity, accountability and accessibility. Digital personal information must be handled with due sensitivity and security while demonstrably protecting its evidential value. Forensic technology makes it possible to: identify privacy issues; establish a chain of custody for provenance; employ write protection for capture and transfer; and detect forgery or manipulation. It can extract and mine relevant metadata and content; enable efficient indexing and searching by curators; and facilitate audit control and granular access privileges. Advancing capabilities promise increasingly effective automation in the handling of ever higher volumes of personal digital information. With the right policies in place, the judicious use of forensic technologies will continue to offer theoretical models, practical solutions and analytical insights. The purpose of this paper is to provide a broad overview of digital forensics, with some pointers to resources and tools that may benefit cultural heritage and, specifically, the curation of personal digital archives."
  • John, Jeremy Leighton, Ian Rowlands, Peter Williams, and Katrina Dean. “Digital Lives: Personal Digital Archives for the 21st Century >> An Initial Synthesis.” Version 0.2. March 3, 2010. http://britishlibrary.typepad.co.uk/files/digital-lives-synthesis02-1.pdf.
    "The digital era has changed the nature and scope of personal archiving forever. The Digital Lives research project accordingly has examined both theoretical and practical aspects of curating personal digital objects, or eMANUSCRIPTS, over the entire archival life cycle. This initial synthesis offers an overview of the emerging field of personal informatics and personal curation. It contemplates three audiences: the individual who is leading a digital life and creating a personal digital archive, the practicing professional archivist and curator, and the scholar and scientist who is accessing the contents of personal archives for research purposes."
  • Kernighan, Brian W. "Bits, Bytes, and Representation of Information." In D Is for Digital: What a Well-Informed Person Should Know About Computers and Communications, 21-34. DisforDigital.net, 2012.
    "In this chapter I'm going to discuss three fundamental ideas about how computers represent information. First, computers are digital processors: they store and process information that comes in discrete chunks and takes on discrete values, basically just numbers, by contrast with analog information, which implies smoothly varying values. Second, computers represent information in bits. A bit is a binary digit, that is, a number that is either 0 or 1. Everything inside the computer is represented with bits. The binary number system is used internally, not the familiar decimal numbers that people use. Third, groups of bits represent larger things. Numbers, letters, words, names, sounds, pictures, movies, and the instructions that make up the programs that process them-all of these are represented as groups of bits."
  • Kirschenbaum, Matthew G., Erika Farr, Kari M. Kraus, Naomi L. Nelson, Catherine Stollar Peters, Gabriela Redwine, and Doug Reside."Approaches to Managing and Collecting Born-Digital Literary Materials for Scholarly Use." College Park, MD: University of Maryland, 2009. http://mith.umd.edu/wp-content/uploads/whitepaper_HD-50346.Kirschenbaum....
    This document reports on "a series of site visits and planning meetings for personnel working with the born-digital components of three significant collections of literary material: the Salman Rushdie papers at Emory University’s Manuscripts, Archives, and Rare Books Library (MARBL), the Michael Joyce Papers (and other collections) at the Harry Ransom Humanities Research Center at The University of Texas at Austin, and the Deena Larsen Collection at the Maryland Institute for Technology in the Humanities (MITH) at the University of Maryland."
  • Kirschenbaum, Matthew G., Richard Ovenden, and Gabriela Redwine. "Digital Forensics and Born-Digital Content in Cultural Heritage Collections." Washington, DC: Council on Library and Information Resources, 2010. http://www.clir.org/pubs/reports/pub149/pub149.pdf
    "While the purview of digital forensics was once specialized to fields of law enforcement, computer security, and national defense, the increasing ubiquity of computers and electronic devices means that digital forensics is now used in a wide variety of cases and circumstances. Most records today are born digital, and libraries and other collecting institutions increasingly receive computer storage media as part of their acquisition of "papers" from writers, scholars, scientists, musicians, and public figures. This poses new challenges to librarians, archivists, and curators—challenges related to accessing and preserving legacy formats, recovering data, ensuring authenticity, and maintaining trust. The methods and tools developed by forensics experts represent a novel approach to these demands. For example, the same forensics software that indexes a criminal suspect's hard drive allows the archivist to prepare a comprehensive manifest of the electronic files a donor has turned over for accession. This report introduces the field of digital forensics in the cultural heritage sector and explores some points of convergence between the interests of those charged with collecting and maintaining born-digital cultural heritage materials and those charged with collecting and maintaining legal evidence."
  • Petzold, Charles. Code: The Hidden Language of Computer Hardware and Software. Redmond, WA: Microsoft Press, 1999. [Bit by Bit by Bit (69-85); Bytes and Hex (180-189)]
    "Code: The Hidden Language of Computer Hardware and Software is a unique exploration into bits, bytes, and the inner workings of computers."
  • Ross, Seamus, and Ann Gow. "Digital Archaeology: Rescuing Neglected and Damaged Data Resources." London: British Library, 1999. http://www.ukoln.ac.uk/services/elib/papers/supporting/pdf/p2.pdf

    "The study examines the approaches to accessing digital materials where the media has become damaged (through disaster or age) or where the hardware or software is either no longer available or unknown. The study begins by looking at the problems associated with media."
  • Rothenberg, Jeff. "Ensuring the Longevity of Digital Information." Washington, DC: Council on Library and Information Resources, 1999. http://www.clir.org/pubs/archives/ensuring.pdf [See especially: "Old bit streams never die--they just become unreadable" and "It's all in the program" (2-11)].

    This piece is a "classic" in the digital preservation literature. In it, Rothenberg provides very clear explanations of various fundamental technical issues that make digital preservation a challenge. Of particular relevance to acquiring and managing data from storage media is his early discussion about types of representation information and software dependencies.
  • Thomas, Susan, Renhart Gittens, Janette Martin, and Fran Baker. "Workbook on Digital Private Papers." 2007. Paradigm Project. http://www.paradigm.ac.uk/workbook/introduction/index.html.

    "The Paradigm Workbook is an evolving resource based on an exemplar project at the academic research libraries of the Universities of Oxford and Manchester. Between January 2005 and February 2007, the Paradigm project explored the issues involved in the long term preservation of personal digital archives using today's politicians and their personal archives as a testbed."
  • White, Ron and Timothy Edward Downs.  How Computers Work.  9th ed. Indianapolis, IN: Que, 2008.

    "Definitive illustrated guide to the world of PCs and technology. In this new edition, you’ll find detailed information not just about every last component of hardware found inside your PC, but also in-depth explanations about home networking, the Internet, PC security, and even how cell phone networks operate."
  • Woods, Kam, Christopher A. Lee, and Simson Garfinkel. “Extending Digital Repository Architectures to Support Disk Image Preservation and Access.” In JCDL '11: Proceeding of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries, 57-66. New York, NY: ACM Press, 2011. http://www.ils.unc.edu/callee/p57-woods.pdf
    "Disk images (bitstreams extracted from physical media) can play an essential role in the acquisition and management of digital collections by serving as containers that support data integrity and chain of custody, while ensuring continued access to the underlying bits without depending on physical carriers. Widely used today by practitioners of digital forensics, disk images can serve as baselines for comparison for digital preservation activities, as they provide fail-safe mechanisms when curatorial actions make unexpected changes to data; enable access to potentially valuable data that resides below the file system level; and provide options for future analysis. We discuss established digital forensics techniques for acquiring, preserving and annotating disk images, provide examples from both research and educational collections, and describe specific forensic tools and techniques, including an object-oriented data packaging framework called the Advanced Forensic Format (AFF) and the Digital Forensics XML (DFXML) metadata representation."

Last updated on 09/12/13, 2:41 pm by tibbo

Groups:


about seo | group_wiki_page