Q: What should I do with data that I've acquired from storage media (process, metadata, tools and workflow)?
Chan, Peter. "Processing Born Digital Materials Using AccessData FTK at Special Collections, Stanford University Libraries." YouTube, 14:46, posted by peterchanws, March 11, 2011. http://www.youtube.com/watch?v=hDAhbR8dyp8
This video covers: how to create a case in FTK, technical metadata, obsolete file formats, viewing image file thumbnails, restricted files, filters, series, bookmarks, and labels.
Shaw, Seth. "Managing Storage Media: Authenticity." YouTube, 2:01, July 2, 2012. Posted by CDCGUNC. February 15, 2013. http://www.youtube.com/watch?v=Z7wzmQS5rlM
Seth Shaw is the Electronic Records Archivist at Duke University. He described the importance of preserving the authenticity of records acquired on storage media.
Shaw, Seth. "Managing Storage Media: Resources." YouTube, 1:00, July 2, 2012. Posted by CDCGUNC. February 15, 2013. http://www.youtube.com/watch?v=cAQntgMcVhY
Seth Shaw is the Electronic Records Archivist at Duke University. He shared some resources for dealing with acquired storage media.
AIMS Working Group. "AIMS Born-Digital Collections: An Inter-Institutional Model for Stewardship." 2012. http://www2.lib.virginia.edu/aims/whitepaper/
"The AIMS project evolved around a common need among the project partners — and most libraries and archives — to identify a methodology or continuous framework for stewarding born-digital archival materials." "The AIMS Framework was developed to define good practice in terms of archival tasks and objectives necessary for success. The Framework, as defined in the White Paper found below, presents a practical approach but also a recognition that there is no single solution for many of the issues that institutions face when dealing with born-digital collections. Instead, the AIMS project partners developed this framework as a further step towards best practice for the profession."
bwFla (Baden-Wuerttemberg Functional Longterm Archiving and Access) Project. http://bw-fla.uni-freiburg.de/wordpress/?page_id=7
"The bwFla project (Baden-Wuerttemberg Functional Longterm Archiving and Access) is a two-year state sponsored project with the goal of defining and providing a practical implementation of archival workflows for the rendering of digital objects (i.e. easily accessed by users) in its original environment (i.e. application). Thereby, the project focuses on supporting the user during object INGEST to identify, provide and describe all secondary objects required as well as create necessary technical meta data for long-term ACCESS through emulation. The emulation proposed uses an INGEST workflow, which requires no further migration of other objects. The further aim of these newly developed workflows is to have them integrated into existing library and archival systems.
Elford, Douglas, Nicholas Del Pozo, Snezana Mihajlovic, David Pearson, Gerard Clifton, and Colin Webb. "Media Matters: Developing Processes for Preserving Digital Objects on Physical Carriers at the National Library of Australia." Paper presented at the 74th IFLA General Conference and Council, Québec, Canada, August 10-14, 2008. http://www.ifla.org/IV/ifla74/papers/084-Webb-en.pdf
"The National Library of Australia has a relatively small but important collection of digital materials on physical carriers, including both published materials and unpublished manuscripts in digital form. To date, preservation of the Library’s physical format digital collections has been largely hand-crafted, but this approach is insufficient to deal effectively with the volume of material requiring preservation. The Digital Preservation Workflow Project aims to produce a semi-automated, scalable process for transferring data from physical carriers to preservation digital mass storage, helping to mitigate the major risks associated with the physical carriers: deterioration of the media and obsolescence of the technology required to access them. The workflow system, expected to be available to Library staff from June 2008, also aims to minimise the time required for acquisition staff to process relatively standard physical media, while remaining flexible to accommodate special cases when required. The system incorporates a range of primarily open source tools, to undertake processes including media imaging, file identification and metadata extraction. The tools are deployed as services within a service-oriented architecture, with workflow processes that use these services being coordinated within a customised system architecture utilising Java based web services. This approach provides flexibility to add or substitute tools and services as they become available and to simplify interactions with other Library systems."
Garfinkel, Simson L. "AFF: A New Format for Storing Hard Drive Images." Communications of the ACM 49, no. 2 (2006): 85-87. http://simson.net/clips/academic/2006.CACM.AFF.pdf
"...we designed a new file format for our forensic work. Called the Advanced Forensics Format (AFF), this format is both open and extensible. Like the EnCase format, AFF stores the imaged disk as a series of pages or segments, allowing the image to be compressed for significant savings. Unlike EnCase, AFF allows metadata to be stored either inside the image file or in a separate, companion file. Although AFF was specifically designed for use in projects involving hundreds or thousands of disk images, it works equally well for practitioners who work with just one or two images. And in the event the disk image is corrupted, AFF internal consistency checks are designed to allow the recovery of as much image data as possible. The AFF format is unencumbered by any patents or trade secrets, and the open source implementation is distributed under a license that allows the code to be freely integrated into either open source or propriety programs."
Garfinkel, Simson. “Digital Forensics XML and the DFXML Toolset.” Digital Investigation 8 (2012): 161-174.
"Digital Forensics XML (DFXML) is an XML language that enables the exchange of structured forensic information. DFXML can represent the provenance of data subject to forensic investigation, document the presence and location of file systems, files, Microsoft Windows Registry entries, JPEG EXIFs, and other technical information of interest to the forensic analyst. DFXML can also document the specific tools and processing techniques that were used to produce the results, making it possible to automatically reprocess forensic information as tools are improved. This article presents the motivation, design, and use of DFXML. It also discusses tools that have been creased that both ingest and emit DFXML files."
Garfinkel, Simson L. "Forensic feature extraction and cross-drive analysis." Digital Investigation 3S (2006): S71-81. http://simson.net/clips/academic/2006.DFRWS.pdf [Specifically: Sections 1-3, p.S71-75]
"This paper introduces Forensic Feature Extraction (FFE) and Cross-Drive Analysis (CDA), two new approaches for analyzing large data sets of disk images and other forensic data. FFE uses a variety of lexigraphic techniques for extracting information from bulk data; CDA uses statistical techniques for correlating this information within a single disk image and across multiple disk images. An architecture for these techniques is presented that consists of five discrete steps: imaging, feature extraction, first-order cross-drive analysis, cross-drive correlation, and report generation. CDA was used to analyze 750 images of drives acquired on the secondary market; it automatically identified drives containing a high concentration of confidential financial records as well as clusters of drives that came from the same organization. FFE and CDA are promising techniques for prioritizing work and automatically identifying members of social networks under investigation. We believe it is likely to have other uses as well."
Gengenbach, Martin J. “‘The Way We Do it Here”’ Mapping Digital Forensics Workflows in Collecting Institutions.” A Master’s Paper for the M.S. in L.S degree. August, 2012. http://digitalcurationexchange.org/system/files/gengenbach-forensic-workflows-2012.pdf
"This paper presents the findings of semi-structured interviews with archivists and curators applying digital forensics tools and practices to the management of born-digital content. The interviews were designed to explore which digital forensic tools are in use, how they are implemented within a digital forensics workflow, and what further challenges and opportunities such use may present. Findings indicate that among interview participants these tools are beneficial in the capture and preservation of born-digital content, particularly with digital media such as external hard drives, and optical or floppy disks. However, interviews reveal that metadata generated from the use of such tools is not easily translated into the arrangement, description, and provision of access to born-digital content."
Kirschenbaum, Matthew G., Erika Farr, Kari M. Kraus, Naomi L. Nelson, Catherine Stollar Peters, Gabriela Redwine, and Doug Reside."Approaches to Managing and Collecting Born-Digital Literary Materials for Scholarly Use." College Park, MD: University of Maryland, 2009. http://mith.umd.edu/wp-content/uploads/whitepaper_HD-50346.Kirschenbaum.WP.pdf
This white paper reports on "a series of site visits and planning meetings for personnel working with the born-digital components of three significant collections of literary material: the Salman Rushdie papers at Emory University’s Manuscripts, Archives, and Rare Books Library (MARBL), the Michael Joyce Papers (and other collections) at the Harry Ransom Humanities Research Center at The University of Texas at Austin, and the Deena Larsen Collection at the Maryland Institute for Technology in the Humanities (MITH) at the University of Maryland."
Lee, Christopher A., Matthew Kirschenbaum, Alexandra Chassanoff, Porter Olsen, and Kam Woods. "BitCurator: Tools and
Techniques for Digital Forensics in Collecting Institutions." D-Lib Magazine 18, No. 5/6 (May/June 2012).
This paper introduces the BitCurator Project, which aims to incorporate digital forensics tools and methods into collecting institutions' workflows. BitCurator is a collaborative effort led by the School of Information and Library Science (SILS) at the University of North Carolina at Chapel Hill and Maryland Institute for Technology in the Humanities (MITH) at the University of Maryland. The project arose from a perceived need in the library/archives community to develop digital forensics tools with interfaces, documentation, and functionality that can support the workflows of collecting institutions. This paper describes current efforts, ongoing work, and implications for future development of forensic-based, analytic software for born-digital materials.
Underwood, William, Marlit Hayslett, Sheila Isbell, Sandra Laib, Scott Sherrill, and Matthew Underwood. “Advanced Decision Support for Archival Processing of Presidential Electronic Records: Final Scientific and Technical Report.” Technical Report ITTL/CSITD 09-05. October 2009. http://perpos.gtri.gatech.edu/publications/TR%2009-05-Final%20Report.pdf
"The overall objective of this project is to develop and apply advanced information technology to decision problems that archivists at the Presidential Libraries encounter when processing electronic records. Among issues and problems to be addressed are areas responsive to national security, including automated content analysis, automatic summarization, advanced information retrieval, advanced support of decision making for access restrictions and declassification, information security, and Global Information Grid technology, which are also important research areas for the U.S. Army." "A method for automatic document type recognition and metadata extraction has been implemented and successfully tested. The method is based on the method for automatically annotating semantic categories such as person’s names, dates, and postal addresses. It extends this method by: (1) identifying about 100 types of intellectual elements of documents, (2) parsing these elements using context-free grammars defining the documentary form of document types, (3) interpreting the pragmatics of the form of the document to identify some or all of the following metadata: the chronological date, author(s), addressee(s), and topic. This metadata can be used for indexing and searching collections of records by person, organization and location names, topics, dates, author’s and addressee’s names and document types. It can also be used for automatically describing items, file units and record series."
Woods, Kam and Christopher A. Lee. “Acquisition and Processing of Disk Images to Further Archival Goals." In Proceedings of Archiving 2012 (Springfield, VA: Society for Imaging Science and Technology, 2012), 147-152. http://www.ils.unc.edu/callee/p147-woods.pdf
"Disk imaging can provide significant data processing and information extraction benefits in archival ingest and preservation workflows, including more efficient automation, increased accuracy in data triage, assurance of data integrity, identifying personally identifying and sensitive information, and establishing environmental and technical context. Information located within disk images can also assist in linking digital objects to other data sources and activities such as versioning information, backups, related local and network user activity, and system logs. We examine each of these benefits and discuss the incorporation of modern digital forensics technologies into archival workflows."
Woods, Kam, Christopher Lee, and Sunitha Misra. “Automated Analysis and Visualization of Disk Images and File Systems for Preservation.” In Proceedings of Archiving 2013 (Springfield, VA: Society for Imaging Science and Technology, 2013), 239-244.
"We present work on the analysis and visualization of disk images and associated filesystems using open source digital forensics software as part of the BitCurator project. We describe methods and software designed to assist in the acquisition of forensically-packaged disk images, analysis of the filesystems they contain, and associated triage tasks in archival workflows. We use open source forensics tools including fiwalk, bulk extractor, and The Sleuth Kit to produce technical metadata. These metadata are then reprocessed to aid in triage tasks, serve as documentation of digital collections, and to support a range of preservation decisions."
Last updated on 08/26/13, 9:20 pm by callee