Skip to Content

Simmons GSLIS Launches Online Digital Stewardship Post-Master's Certificate

Focus: 

The Simmons College Graduate School of Library and Information Science is pleased to announce a new post-master's certificate in Digital Stewardship.

Simmons GSLIS Launches Online Digital Stewardship Post-Master's Certificate

Focus: 

The Simmons College Graduate School of Library and Information Science is pleased to announce a new post-master's certificate in Digital Stewardship.

Screening the Future 2012: Play, Pause and Press Forward

Focus: 
Date: 
Monday, May 21, 2012 (All day) - Wednesday, May 23, 2012 (All day)

Storage Media - Access

Q. How do I provide access to information I've acquired from storage media?

Most often, the fruit of your digital curation efforts is providing access to the content you have worked so hard to collect and preserve.  There are many different methods and tools you can use to do this and many considerations you must take into account in deciding how you will approach it.

Take action

  • Determine who you would like to have access to your content
  • Assess tools for providing access to your content
  • Implement a system for providing access to your content

Explore tools

Read

  • Cornell University Library. "Balancing Access Issues."  Chap. 5 in Digital Preservation Management Tutorial: Implementing Short-term Strategies for Long-term Problems, 2003-2007.  http://www.dpworkshop.org/dpm-eng/challenges/security.html
    Provides a brief overview of the concerns of security and ease of access along with an exercise and some suggested resources.
  • Woods, Kam and Geoffrey Brown. "Creating Virtual CD-ROM Collections." International Journal of Digital Curation 4, no. 2 (2009): 184-198.
    "Over the past 20 years, more than 100,000 CD-ROM titles have been published including thousands of collections of government documents and data. CD-ROMs present preservation challenges at the bit level and in ensuring usability of the preserved artifact. We present techniques we have developed to archive and support user access to a collection of approximately 2,900 CD-ROMs published under the Federal Depository Library Program (FDLP) by the United States Government Printing Office (GPO). The project provides web-based access to CD-ROM contents using both migration and emulation and supports remote execution of the raw CD-ROM images. Our project incorporates off-the-shelf, primarily open-source software. The raw data and (METS) metadata are made available through AFS, a standard distributed file system, to encourage sharing among libraries."
  • Woods, Kam, and Geoffrey Brown. "From Imaging to Access - Effective Preservation of Legacy Removable Media." In Archiving 2009: Preservation Strategies and Imaging Technologies for Cultural Heritage Institutions and Memory Organizations: Final Program and Proceedings, 213-18. Springfield, VA: Society for Imaging Science and Technology, 2009. http://www.digpres.com/publications/woodsbrownarch09.pdf
    We describe how existing media collections can be virtualized through digital copies that are accessible via ordinary workstations. In the model presented, bit-identical images of the original media are served from a distributed file system to provide convenient access on local or remote workstations. Our approach incorporates mature open source libraries to provide high-quality, low-cost image extraction, file format identification, metadata creation, and web-driven access; we include specific examples of custom scripting designed to enhance usability.
  • Woods, Kam and Geoffrey Brown. “Migration Performance for Legacy Data Access.” International Journal of Digital Curation 3, no. 2 (2008): 74-88.
    "We present performance data relating to the use of migration in a system we are creating to provide web access to heterogeneous document collections in legacy formats. Our goal is to enable sustained access to collections such as these when faced with increasing obsolescence of the necessary supporting applications and operating systems. Our system allows searching and browsing of the original files within their original contexts utilizing binary images of the original media. The system uses static and dynamic file migration to enhance collection browsing, and emulation to support both the use of legacy programs to access data and long-term preservation of the migration software. While we provide an overview of the architectural issues in building such a system, the focus of this paper is an in-depth analysis of file migration using data gathered from testing our software on 1,885 CD-ROMs and DVDs. These media are among the thousands of collections of social and scientific data distributed by the United States Government Printing Office (GPO) on legacy media (CD-ROM, DVD, floppy disk) under the Federal Depository Library Program (FDLP) over the past 20 years."

Last updated on 12/31/69, 7:00 pm by Anonymous

Storage Media - Legal and Ethical Issues

Q. What legal and ethical considerations should I consider?

Digital collection management presents a number of legal issues to take into consideration.  Before you start any digital information management activity, you should take some time to consider how you will address these issues and document your decisions in a formal plan and/or policy.

 

Take action

Consider:

  • Personal data
  • Sensitive data
  • Confidential data
  • Informed consent
  • Anonymity
  • Copyright

 

Review use cases and resources

 

Read

  • Van den Eynden, Veerle, Louise Corti, et al.  "Managing and sharing data."  Colchester, UK Data Archive, 2011.  http://www.data-archive.ac.uk/media/2894/managingsharing.pdf
    Section on Ethics and Consent (pages 22-31) suggests three factors to be considered in gathering research data about people: consent, anonymising data, and controlling access to data.
  • Cornell University Library. "Legal Issues."  Chap. 5 in Digital Preservation Management Tutorial: Implementing Short-term Strategies for Long-term Problems, 2003-2007.  http://www.dpworkshop.org/dpm-eng/challenges/accountability.html

    Provides a brief overview of the legal issues involving copyright along with a case study and some suggested resources.

  • Library of Congress NDIIP, JISC, OAK Law Project, and the SURFfoundation.  "International Study on the Impact of Copyright Law on Digital Preservation."  September 2008.  http://www.digitalpreservation.gov/documents/digital_preservation_final_report2008.pdf
    This study focuses on the copyright and related laws of Australia, the Netherlands, the United Kingdom and the United States and the impact of those laws on digital preservation of copyrighted works.  It also addresses proposals for legislative reform and efforts to develop non-legislative solutions to the challenges that copyright law presents for digital preservation.
  • Coyle, Karen.  “Rights in the PREMIS Data Model: A Report for the Library of Congress.”  Washington, D.C.: Library of Congress, December 2006.  http://www.loc.gov/standards/premis/Rights-in-the-PREMIS-Data-Model.pdf
    The PREMIS standard contains a rights entity that allows the association of rights with specific digital preservation actions.  This paper looks at the various definitions of rights, the state of rights metadata, and surveys legislative actions taking place in many nations that will provide a legal standing for digital preservation activities.
  • Hirtle, Peter B.  "Digital Preservation and Copyright."  Stanford University Libraries & Academic Information Resources.  http://fairuse.stanford.edu/commentary_and_analysis/2003_11_hirtle.html
    Considers copyright law in the light of making digital preservation copies.
  • Hirtle, Peter.  "Copyright Term and the Public Domain in the United States."  Last updated January 3, 2012.  http://copyright.cornell.edu/resources/docs/copyrightterm.pdf
    This table details U.S. copyright law applications for various types of works as of January 1, 2012.
  • JISC.  "Copyright and Intellectual Property Law."  http://www.jisclegal.ac.uk/LegalAreas/CopyrightIPR.aspx
    JISC Legal offers sector specific guidance and detailed publications to assist you in this area of law, as well as a varied range of FAQs with relevant examples and useful recommended links to other resources.
  • Besek, June M.  "Copyright Issues Relevant to the Creation of a Digital Archive: A Preliminary Assessment."  Council on Library and Information Resources & Library of Congress, January 2003.  http://www.clir.org/pubs/reports/pub112/reports/pub112/pub112.pdf
    This paper describes copyright rights and exceptions and highlights issues potentially involved in the creation of a nonprofit digital archive.

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

Storage Media - Management Skills

Q. How do I gain the skills I need to manage a project?

Preparing yourself for the role of running a project can involve many different things.  Because projects focused on pulling digital information form storage media involve a wide array of stakeholders and many different moving parts, having a solid foundation in project management is key.  In addition to the people skills inherent in project management, an institutional repository manager should have a firm grasp of the history and overall landscape of institutional repository development. 

 

Take action

  • Take some project management courses
  • Read up on project management principles

 

Read -- same as Digitization - What project management skills do I need? AND Building IRs - Preparing to be in Charge

  • Horine, Greg.  Absolute Beginner's Guide to Project Management. 2nd ed.  Indianapolis, IN: Que, 2009.
    You’ve just been handed your department's biggest project. Absolute Beginner's Guide to Project Management will show you exactly where to start–and walk you step by step through your entire project! Expert project manager Gregory Horine shows you exactly what works and what doesn’t, drawing on the field’s proven best practices. Understand your role as a project manager...gain the skills and discover the personal qualities of great project managers...learn how to organize, estimate, and schedule projects effectively...manage deliverables, issues, changes, risks, quality, vendors, communications, and expectations...make the most of technology...manage virtual teams...avoid the problems that trip up new project managers! This new edition jumpstarts your project management expertise even faster, with all-new insights on Microsoft Project, challenging project situations and intriguing project management topics of the day.
  • Campbell, G. Michael.  Communications Skills for Project Managers.  New York: AMACOM, 2009.
    The number one factor in the success or failure of projects is the quality and consistency of communications.  If you’re a project manager, the bulk of this responsibility falls to you. In Communications Skills for Project Managers, Michael Campbell unlocks this critical component of project success, illustrating how to keep every project stakeholder in the loop every step of the way—from concept through delivery and beyond.  A veteran of countless projects on every conceivable scale, Campbell gives you the universal elements of all communications as they pertain to the specific demands of a project management environment.  And you’ll get a generous selection of powerful tools to help you.
  • Allan, Barbara.  Project Management: Tools and Techniques for Today's ILS Professional.  London: Facet, 2004.
    Offers in-depth guidance on project management for librarians working alone and for those working in large organizations.  Topics covered include project life cycle and analysis, planning, implementation, evaluation and dissemination, finance, personnel, partnerships, and more.  Allan explores both paper-based and management software approaches to large and small scale project management.
  • Carpenter, Julie.  Project Management in Libraries, Archives and Museums: Working with Government and Other External Partners.  Oxford: Chandos Publishing, 2011.
    Aimed at practitioners and managers, this practical handbook provides a source of guidance on project management techniques for the academic and cultural heritage sectors, focusing on managing projects involving public sector and other external partners.  Issues under consideration and illustration include: different approaches to managing projects and how to select appropriate methods; using project management tools and other applications in project development and implementation; ensuring the sustainability of project outcomes and transferability into practice; realistic monitoring methodologies and specifying and commissioning evaluation work that has real value.
  • JISC Digital Media.  "Project Management for a Digitisation Project."  Last updated November 14, 2008. http://www.jiscdigitalmedia.ac.uk/crossmedia/advice/project-management-for-a-digitisation-project/
    This paper takes a look at the role and responsibilities of the digitisation project manager. It addresses common managerial challenges such balancing the expectations of stakeholders and ensuring the of quality of output. It is intended to be of use to the management team of time limited digitisation projects or to resource management staff planning to digitise their collection.

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

Storage Media - Protect

Q. What actions should I take to protect the information I pull from storage media?

Once you pull digital information from physical media, you will have to take measures to protect it from corruption, theft, decay, and loss. 

 

Take action

  • Run virus scan on all disks and files examines 
  • Make sure you are in a secure/virtual environment
  • Make multiple backups of your content (at least two, but optimally six)
  • Perform checksum procedures to detect changes
  • Develop and follow policies to manage obsolescence

 

Watch

with the exception of the New South Wales piece, everything else below is also on the Digitizing - Protect page (which is also very similar to the Digitizing - How can I prepare for sustainability and for the future? page) (CB) -- I can copy the updated version over here, if desired

  • Long Term Digital Preservation (Some Initiatives in India and Germany). (2011, August)
  • Who Is Doing a Good Job in Digital Preservation? David Giaretta, director of Alliance for Permanent Access

Review use cases

Read

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

Physical Media - Storing Bits

Q. How should I store the bits I pull from digital media?

In reviewing your options for storing the bits you pulled from physical media, consider these six when selecting your storage media :

  1. Longevity - your chosen media should have a proven life span of at least ten years
  2. Capacity - make sure that your media can adequately store your the content you have now and the content you plan to collect in the future
  3. Viability - the media you choose should support error detection and data recovery 
  4. Obsolescence - choose technology that is well established and widely available
  5. Cost - consider both the cost for purchasing the media and the cost for maintaining it over time
  6. Susceptibility - the media you choose should show a low susceptibility to data loss and physical damage *

* From: Selecting Storage Media for Long-Term Preservation. by Adrian Brown. The National Archives. (2003, June). [http://www.nationalarchives.gov.uk/documents/selecting-storage-media.pdf]

Take action

  • Review storage available to you
  • Review current storage options
  • Choose storage
  • Implement storage
  • Monitor storage

Review use cases

Read

  • Selecting Storage Media for Long-Term Preservation. by Adrian Brown. The National Archives. (2003, June) [http://www.nationalarchives.gov.uk/documents/selecting-storage-media.pdf]
    This guide provides information for the creators and managers of electronic  records about the selection of physical storage media in the context of long-term preservation.The scope of this guidance note is limited to removable storage media.

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

Storage Media - Selection

Q.

Storage Media - Identify

Q. What should I do to identify the materials with which I am working?

As part of the process of pulling digital information from physical media, you'll not only have to accurately identify what storage media you are working with, but also the digital file formats that are contained on the media and the risks associated with these file formats.

Take action

  • Identify the storage media you have and what content you may be acquiring
  • Identify the formats of files

Explore Tools

  • Disk Inventory X. http://www.derlien.com/
    Disk Inventory X identifies and visualizes the sizes of files and folders. It runs in Mac OS X 10.3 (and later). Note: You must be able to mount the drive in order to run Disk Inventory X on it; it doesn't work for disk images or disks that have filesystems that your computer's operating system can't recognize.
  • Disk Usage Analyzer. http://www.marzocca.net/linux/baobab/
    Disk Usage Analyzer is graphical disk usage analyzer that can run in GNOME on Linux/Unix. It can be run against an entire volume or individual directories.
  • DROID http://sourceforge.net/projects/droid/
    DROID (Digital Record Object Identification) is an automatic file format identification tool. It is the first in a planned series of tools developed by The National Archives under the umbrella of its PRONOM technical registry service.
  • FIDO (Format Identification for Digital Objects (FIDO). https://github.com/openplanets/fido
    FIDO is a command-line tool to identify the file formats of digital objects.
  • file (Unix). http://en.wikipedia.org/wiki/File_%28command%29
    file is a command-line utility for identifying file types. It was initially introduced in the 1970s, and it is included with every major distribution of Unix/Linux.
  • Global Digital Format Registry http://www.gdfr.info/
    The GDFR is meant to be a distributed and replicated registry of format information populated and vetted by experts and enthusiasts world-wide.
  • Grand Perspective. http://grandperspectiv.sourceforge.net/
    Disk Inventory X identifies and visualizes the sizes of files and folders. It runs in Mac OS X. Note: You must be able to mount the drive in order to run Grand Perspective on it; it doesn't work for disk images or disks that have filesystems that your computer's operating system can't recognize.
  • JHOVE (JSTOR/Harvard Object Verification Environment. http://sourceforge.net/projects/jhove/
    JHOVE2 is open source software for format-specific identification, validation, and characterization of digital objects. See also JHOVE2
  • KDirStat. http://kdirstat.sourceforge.net/
    KDirStat is a graphical disk usage utility that runs in KDE on Linux/Unix.
  • MagicDisc - Magic ISO. http://www.magiciso.com/tutorials/miso-magicdisc-overview.htm
    MagicDisc is a free utility "designed for creating and managing virtual CD drives and CD/DVD discs." It can be used to mount disk images as drives in a Windows environment.
  • OSFMount - PassMark Software. http://www.osforensics.com/tools/mount-disk-images.html
    OSFMount is a free utility for mounting disk images (dd and .iso) in Windows.
  • PRONOM: The Technical Registry http://www.nationalarchives.gov.uk/PRONOM/Default.aspx
    PRONOM is a resource for anyone requiring impartial and definitive information about the file formats, software products and other technical components required to support long-term access to electronic records and other digital objects of cultural, historical or business value.
  • TreeSize Professional - JAM Software. http://www.jam-software.com/treesize/
    FreeSize can be used to identify and visualize the contents of a drive. It also supports basic analysis, export, reporting and duplicate identification.  Note: You must be able to mount the drive in order to run Treesize on it; it doesn't work for disk images or disks that have filesystems that Treesize or your computer's operating system can't recognize.
  • WinDirStat. http://windirstat.info/
    WinDirStat identifies and visualizes the sizes of files and folders. It runs in Windows 95 or later. Note: You must be able to mount the drive in order to run WinDirStat on it; it doesn't work for disk images or disks that have filesystems that your computer's operating system can't recognize.

Watch

  • Underwood, William.  "File Format Identification Technologies."  YouTube, 39:36, from a presentation on June 25, 2010.  Posted by usnationalarchives.  October 13, 2010.  http://www.youtube.com/watch?v=dVMs5YnZ0HU

 

Gives a progress report on a promising technology for File Format Identification for use in NARA archival processes. Dr. Underwood is at the Georgia Tech Research Institute (GTRI).

Read

  • Arms, Caroline R., Carl Fleischhauer, and Jimi Jones. "Sustainability of Digital Formats: Planning for Library of Congress Collections," last updated December 12, 2011. http://www.digitalpreservation.gov/formats/
    The Digital Formats Web site provides information about digital content formats. The analyses and resources presented here will increase and be updated over time.
  • Born, Günter. The File Formats Handbook. London: International Thomson Computer Press, 1995.
    Born documents a variety of file formats that archivists and librarians are likely to find in collections of materials from the 1980s and 1990s. Of particular note are numerous figures that include hex dumps (hexadecimal representation of the contents) of files in particular formats.
  • Kessler, Gary C. "File Signatures Table." http://www.garykessler.net/library/file_sigs.html
    This document contains more than 400 file signatures (aka "magic numbers") that can be used to identify specific file types. Several of the tools listed above make use of magic numbers, and you can also view magic numbers by opening a file in a hex editor.
  • Lechich, Roy.  "File Format Identification and Validation Tools."  Yale University: Integrated Library & Technology Systems, February 2007. http://www.library.yale.edu/iac/DPC/FileIDandValidate.pdf
    Provides a brief overview of file type identification and file format validation. Analyzes some of the currently available open source academic tools related to file format.
  • Mediapedia. National Library of Australia. http://mediapedia.nla.gov.au/home.php
    Mediapedia "is intended to enable the identification of various physical media carrier types for assisting with collection planning, assessment, documentation, infrastructure and preservation planning for the content they hold. These could include media across various genres such as cine, video, photo, audio, data, paper carriers, microfilm, etc."
  • Pearson, David and Colin Webb.  "Defining File Format Obsolescence: A risky journey."  International Journal of Digital Curation 3, no. 1 (2008): 89-106.  http://www.ijdc.net/index.php/ijdc/article/download/76/44 -- link opens a PDF file
    This paper reports on the AONS (Automatic Obsolescence Notification System) II Project, which aimed to refine and develop a software tool that would automatically find and report indicators of obsolescence risks, to help repository managers decide if preservation action is needed.
  • Todd, Malcolm.  "Technology Watch Report: File formats for preservation."  Digital Preservation Coalition, October 2009.  http://www.dpconline.org/component/docman/doc_download/375-file-formats-for-preservation -- link opens a PDF file
    Considers various criteria for file formats: adoption, technological dependencies, disclosure, transparency, metadata support, reusability/interoperability, robustness/complexity, stability, intellectual property/digital rights production, the ability of formats to convey content information, extent of format, and cost.

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

Storage Media - Prepare

Q. How should I prepare to acquire information from storage media?

Before you begin extracting information from physical media, it will be helpful to assess your current situation. Take a look at what physical media you already have and what media you are considering to add to your holdings. Make sure you understand the processes of acquiring information from physical media – including the human resources, technology, and costs. It is also be a good idea to understand how information is saved and accessed in computer systems.  Information professionals can learn a lot from the field of digital forensics, which has established methods and tools for making trustworthy and complete copies of information, and then extracting relevant data and metadata.

Take action

  • Review use cases, watch videos, and read literature to gain a greater understanding of computing principles
  • Perform needs and resource assessments
  • Establish the monetary, human, and technological resources you need and what you have available
  • Prepare clearly defined policies for processes

Explore Tools

  • BitCurator. http://bitcurator.net
    The BitCurator project is developing and disseminating a suite of open source tools to integrate digital forensics tools and techniques into archival and library workflows. The tools are being developed and tested in a Linux environment, but the software on which they depend can be compiled for Windows environments (and in most cases are currently distributed as both source code and Windows binaries). There are two primary paths to implement the software: 
    • As a ready-to-run Linux (Ubuntu) environment that can be used either as a virtual machine or installed as a host operating system. This environment is customized to provide users with graphic user interface (GUI)-based scripts that provide simplified access to common functions associated with handling media, including facilities to prevent inadvertent write-enabled mounting (software write-blocking).
    • As a set of individual software tools, packages, support scripts, and documentation to reproduce full or partial functionality of the ready-to-run BitCurator environment.
  • Digital Forensics Tools http://digitalcurationexchange.org/node/2038
    This is a list of digital forensics tools from the appendices of the paper "Digital Forensics and Born-Digital Content in Cultural Heritage Collections" by Matthew G. Kirschenbaum, Richard Ovenden, and Gabriela Redwine, with research assistance from Rachel Donahue.
  • EnCase - Guidance Software. http://www.guidancesoftware.com/encase-forensic.htm
    EnCase is a commercial digital forensics package with a variety of features.
  • Forensic Tool Kit (FTK) - AccessData. http://www.accessdata.com/products/digital-forensics/ftk
    FTK is a commercial digital forensics package with a variety of features.
  • ILooKIX - Perlusto. http://www.perlustro.com/solutions/e-forensics/ilookix
    ILooKIX is a commercial digital forensics package with a variety of features.
  • OSForensics. http://www.osforensics.com/osforensics.html
    OSForensics supports "hash matching, drive signature comparisons, e-mails, memory and binary data. It lets you extract forensic evidence from computers quickly with advanced file searching and indexing and enables this data to be managed effectively."
  • The Sleuth Kit (TSK) - Basis Technology. http://www.sleuthkit.org/
    TSK is an open-source suite of digital forensics tools that "can be used to analyze disk images and perform in-depth analysis of file systems (such as NTFS, FAT, HFS+, Ext3, and UFS) and several volume system types." It can be run on Windows, Linux/Unix, and Mac OS X.
  • Tools - Forensics Wiki. http://www.forensicswiki.org/wiki/Category:Tools
    The Forensics Wiki includes information about a variety of tools that can be used to acquire, manage and analyze data from storage media. It was created by Simson Garfinkel.

Watch

  • Shaw, Seth.  "Preparing Storage Media: Inventory."  YouTube, 1:18, July 2, 2012.  Posted by CDCGUNC.  February 15, 2013.  http://www.youtube.com/watch?v=InI8841pC0k

    Seth Shaw is the Electronic Records Archivist at Duke University. He provided advice about the importance of conducting a thorough inventory of acquired storage media.
  • Shaw, Seth.  "Preparing Storage Media: Challenges."  YouTube, 2:00, July 2, 2012.  Posted by CDCGUNC.  February 15, 2013.  http://www.youtube.com/watch?v=8AzZaj-Fe-w

    Seth Shaw is the Electronic Records Archivist at Duke University. He described some of the challenges of handling acquired storage media.

Read

  • AIMS Working Group. "AIMS Born-Digital Collections: An Inter-Institutional Model for Stewardship." 2012. http://www2.lib.virginia.edu/aims/whitepaper/
    "The AIMS project evolved around a common need among the project partners — and most libraries and archives — to identify a methodology or continuous framework for stewarding born-digital archival materials." "The AIMS Framework was developed to define good practice in terms of archival tasks and objectives necessary for success. The Framework, as defined in the White Paper found below, presents a practical approach but also a recognition that there is no single solution for many of the issues that institutions face when dealing with born-digital collections. Instead, the AIMS project partners developed this framework as a further step towards best practice for the profession."
  • Beek, Christiaan. "Introduction to File Carving." McAfee. 2011. http://www.mcafee.com/us/resources/white-papers/foundstone/wp-intro-to-file-carving.pdf
    "'File carving,' or sometimes simply 'carving,' is the process of extracting a collection of data from a larger data set. Data carving techniques frequently occur during a digital investigation when the unallocated file system space is analyzed to extract files. The files are 'carved' from the unallocated space using file type-specific header and footer values. File system structures are not used during the process. File carving is a powerful technique for recovering files and fragments of files when directory entries are corrupt or missing. The block of data is searched block by block for residual data matching the file type-specific header and footer values."
  • Carrier, Brian. "Computer Foundations." In File System Forensic Analysis, 17-45. Boston, MA: Addison-Wesley, 2005. [See also "Hard Disk Data Acquisition" (47-66).]
    "The goal of this chapter is to cover the low-level basics of how computers operate. In the following chapters of this book, we examine, in detail, how data are stored, and this chapter provides background information for those who do not have programming or operating system design experience. This chapter starts with a discussion about data and how they are organized on disk. We discuss binary versus hexadecimal values and little- and big-endian ordering. Next, we examine the boot process and code required to start a computer. Lastly, we examine hard disks and discuss their geometry, ATA commands, host protected areas, and SCSI."
  • Farmer, Dan and Wietse Venema.  "File System Basics."  In Forensic Discovery.  Upper Saddle River, NJ: Addison-Wesley, 2005.  http://www.porcupine.org/forensics/forensic-discovery/chapter3.html
    "In this chapter we will explore some fundamental properties of file systems. As the primary storage component of a computer the file system can be the source of a great deal of forensic information. We'll start with the basic organization of file systems and directories, including how they may be mounted on top of each other to hide information. We'll then move onto various types of files along with their limits and peculiarities, as well as the basic inode and data block relationship. Next we outline the lowest levels of the file system - partitions, zones, inode and data bitmaps, and the superblock. Along the way we'll discuss and introduce a variety of tools and methods to facilitate our exploration and analysis."
  • Garfinkel, Simson, and David Cox. "Finding and Archiving the Internet Footprint." Paper presented at the First Digital Lives Research Conference: Personal Digital Archives for the 21st Century, London, UK, February 9-11, 2009. http://simson.net/clips/academic/2009.BL.InternetFootprint.pdf
    "With the move to “cloud” computing, archivists face the increasingly difficult task of finding and preserving the works of an originator so that they may be readily used by future historians. This paper explores the range of information that an originator may have left on computers “out there on the Internet,” including works that are publicly identified with the originator; information that may have been stored using a pseudonym; anonymous blog postings; and private information stored on web-based services like Yahoo Calendar and Google Docs. Approaches are given for finding the content, including interviews, forensic analysis of the originator’s computer equipment, and social network analysis. We conclude with a brief discussion of legal and ethical issues."
  • Hillis, W. Daniel. The Pattern on the Stone: The Simple Ideas That Make Computers Work. 1st ed. New York: Basic Books, 1998. [Nuts and Bolts (1-19); Universal Building Blocks (21-38)]
    "Most people are baffled by how computers work and assume that they will never understand them. What they don’t realize—and what Daniel Hillis’s short book brilliantly demonstrates—is that computers’ seemingly complex operations can be broken down into a few simple parts that perform the same simple procedures over and over again. Computer wizard Hillis offers an easy-to-follow explanation of how data is processed that makes the operations of a computer seem as straightforward as those of a bicycle.Avoiding technobabble or discussions of advanced hardware, the lucid explanations and colorful anecdotes in The Pattern on the Stone go straight to the heart of what computers really do. Hillis proceeds from an outline of basic logic to clear descriptions of programming languages, algorithms, and memory. He then takes readers in simple steps up to the most exciting developments in computing today—quantum computing, parallel computing, neural networks, and self-organizing systems.Written clearly and succinctly by one of the world’s leading computer scientists, The Pattern on the Stone is an indispensable guide to understanding the workings of that most ubiquitous and important of machines: the computer."
  • John, Jeremy Leighton. "Adapting Existing Technologies for Digitally Archiving Personal Lives: Digital Forensics, Ancestral Computing, and Evolutionary Perspectives and Tools." Paper presented at iPRES 2008: The Fifth International Conference on Preservation of Digital Objects, London, UK, September 29-30, 2008. http://www.bl.uk/ipres2008/presentations_day1/09_John.pdf
    "The adoption of existing technologies for digital curation, most especially digital capture, is outlined in the context of personal digital archives and the Digital Manuscripts Project at the British Library. Technologies derived from computer forensics, data conversion and classic computing, and evolutionary computing are considered. The practical imperative of moving information to modern and fresh media as soon as possible is highlighted, as is the need to retain the potential for researchers of the future to experience the original look and feel of personal digital objects. The importance of not relying on any single technology is also emphasised."
  • John, Jeremy Leighton. "Digital Forensics and Preservation." DPC Technology Watch Report 12-03.  Digital Preservation Coalition, November 2012  http://www.dpconline.org/component/docman/doc_download/810-dpctw12-03pdf
    "In recent years, digital forensics has emerged as an essential source of tools and approaches for facilitating digital preservation and curation, specifically for protecting and investigating evidence from the past. Institutional repositories and professionals with responsibilities for personal archives can benefit from forensics in addressing digital authenticity, accountability and accessibility. Digital personal information must be handled with due sensitivity and security while demonstrably protecting its evidential value. Forensic technology makes it possible to: identify privacy issues; establish a chain of custody for provenance; employ write protection for capture and transfer; and detect forgery or manipulation. It can extract and mine relevant metadata and content; enable efficient indexing and searching by curators; and facilitate audit control and granular access privileges. Advancing capabilities promise increasingly effective automation in the handling of ever higher volumes of personal digital information. With the right policies in place, the judicious use of forensic technologies will continue to offer theoretical models, practical solutions and analytical insights. The purpose of this paper is to provide a broad overview of digital forensics, with some pointers to resources and tools that may benefit cultural heritage and, specifically, the curation of personal digital archives."
  • John, Jeremy Leighton, Ian Rowlands, Peter Williams, and Katrina Dean. “Digital Lives: Personal Digital Archives for the 21st Century >> An Initial Synthesis.” Version 0.2. March 3, 2010. http://britishlibrary.typepad.co.uk/files/digital-lives-synthesis02-1.pdf.
    "The digital era has changed the nature and scope of personal archiving forever. The Digital Lives research project accordingly has examined both theoretical and practical aspects of curating personal digital objects, or eMANUSCRIPTS, over the entire archival life cycle. This initial synthesis offers an overview of the emerging field of personal informatics and personal curation. It contemplates three audiences: the individual who is leading a digital life and creating a personal digital archive, the practicing professional archivist and curator, and the scholar and scientist who is accessing the contents of personal archives for research purposes."
  • Kernighan, Brian W. "Bits, Bytes, and Representation of Information." In D Is for Digital: What a Well-Informed Person Should Know About Computers and Communications, 21-34. DisforDigital.net, 2012.
    "In this chapter I'm going to discuss three fundamental ideas about how computers represent information. First, computers are digital processors: they store and process information that comes in discrete chunks and takes on discrete values, basically just numbers, by contrast with analog information, which implies smoothly varying values. Second, computers represent information in bits. A bit is a binary digit, that is, a number that is either 0 or 1. Everything inside the computer is represented with bits. The binary number system is used internally, not the familiar decimal numbers that people use. Third, groups of bits represent larger things. Numbers, letters, words, names, sounds, pictures, movies, and the instructions that make up the programs that process them-all of these are represented as groups of bits."
  • Kirschenbaum, Matthew G., Erika Farr, Kari M. Kraus, Naomi L. Nelson, Catherine Stollar Peters, Gabriela Redwine, and Doug Reside."Approaches to Managing and Collecting Born-Digital Literary Materials for Scholarly Use." College Park, MD: University of Maryland, 2009. http://mith.umd.edu/wp-content/uploads/whitepaper_HD-50346.Kirschenbaum....
    This document reports on "a series of site visits and planning meetings for personnel working with the born-digital components of three significant collections of literary material: the Salman Rushdie papers at Emory University’s Manuscripts, Archives, and Rare Books Library (MARBL), the Michael Joyce Papers (and other collections) at the Harry Ransom Humanities Research Center at The University of Texas at Austin, and the Deena Larsen Collection at the Maryland Institute for Technology in the Humanities (MITH) at the University of Maryland."
  • Kirschenbaum, Matthew G., Richard Ovenden, and Gabriela Redwine. "Digital Forensics and Born-Digital Content in Cultural Heritage Collections." Washington, DC: Council on Library and Information Resources, 2010. http://www.clir.org/pubs/reports/pub149/pub149.pdf
    "While the purview of digital forensics was once specialized to fields of law enforcement, computer security, and national defense, the increasing ubiquity of computers and electronic devices means that digital forensics is now used in a wide variety of cases and circumstances. Most records today are born digital, and libraries and other collecting institutions increasingly receive computer storage media as part of their acquisition of "papers" from writers, scholars, scientists, musicians, and public figures. This poses new challenges to librarians, archivists, and curators—challenges related to accessing and preserving legacy formats, recovering data, ensuring authenticity, and maintaining trust. The methods and tools developed by forensics experts represent a novel approach to these demands. For example, the same forensics software that indexes a criminal suspect's hard drive allows the archivist to prepare a comprehensive manifest of the electronic files a donor has turned over for accession. This report introduces the field of digital forensics in the cultural heritage sector and explores some points of convergence between the interests of those charged with collecting and maintaining born-digital cultural heritage materials and those charged with collecting and maintaining legal evidence."
  • Petzold, Charles. Code: The Hidden Language of Computer Hardware and Software. Redmond, WA: Microsoft Press, 1999. [Bit by Bit by Bit (69-85); Bytes and Hex (180-189)]
    "Code: The Hidden Language of Computer Hardware and Software is a unique exploration into bits, bytes, and the inner workings of computers."
  • Ross, Seamus, and Ann Gow. "Digital Archaeology: Rescuing Neglected and Damaged Data Resources." London: British Library, 1999. http://www.ukoln.ac.uk/services/elib/papers/supporting/pdf/p2.pdf

    "The study examines the approaches to accessing digital materials where the media has become damaged (through disaster or age) or where the hardware or software is either no longer available or unknown. The study begins by looking at the problems associated with media."
  • Rothenberg, Jeff. "Ensuring the Longevity of Digital Information." Washington, DC: Council on Library and Information Resources, 1999. http://www.clir.org/pubs/archives/ensuring.pdf [See especially: "Old bit streams never die--they just become unreadable" and "It's all in the program" (2-11)].

    This piece is a "classic" in the digital preservation literature. In it, Rothenberg provides very clear explanations of various fundamental technical issues that make digital preservation a challenge. Of particular relevance to acquiring and managing data from storage media is his early discussion about types of representation information and software dependencies.
  • Thomas, Susan, Renhart Gittens, Janette Martin, and Fran Baker. "Workbook on Digital Private Papers." 2007. Paradigm Project. http://www.paradigm.ac.uk/workbook/introduction/index.html.

    "The Paradigm Workbook is an evolving resource based on an exemplar project at the academic research libraries of the Universities of Oxford and Manchester. Between January 2005 and February 2007, the Paradigm project explored the issues involved in the long term preservation of personal digital archives using today's politicians and their personal archives as a testbed."
  • White, Ron and Timothy Edward Downs.  How Computers Work.  9th ed. Indianapolis, IN: Que, 2008.

    "Definitive illustrated guide to the world of PCs and technology. In this new edition, you’ll find detailed information not just about every last component of hardware found inside your PC, but also in-depth explanations about home networking, the Internet, PC security, and even how cell phone networks operate."
  • Woods, Kam, Christopher A. Lee, and Simson Garfinkel. “Extending Digital Repository Architectures to Support Disk Image Preservation and Access.” In JCDL '11: Proceeding of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries, 57-66. New York, NY: ACM Press, 2011. http://www.ils.unc.edu/callee/p57-woods.pdf
    "Disk images (bitstreams extracted from physical media) can play an essential role in the acquisition and management of digital collections by serving as containers that support data integrity and chain of custody, while ensuring continued access to the underlying bits without depending on physical carriers. Widely used today by practitioners of digital forensics, disk images can serve as baselines for comparison for digital preservation activities, as they provide fail-safe mechanisms when curatorial actions make unexpected changes to data; enable access to potentially valuable data that resides below the file system level; and provide options for future analysis. We discuss established digital forensics techniques for acquiring, preserving and annotating disk images, provide examples from both research and educational collections, and describe specific forensic tools and techniques, including an object-oriented data packaging framework called the Advanced Forensic Format (AFF) and the Digital Forensics XML (DFXML) metadata representation."

Last updated on 12/31/69, 7:00 pm by Anonymous

Building IRs - Access [remove]

DISH 2011

Focus: 
Focus: 
Date: 
Tuesday, December 6, 2011 (All day) - Friday, December 9, 2011 (All day)

 

iConference 2012 Registration

*************************************************************

iConference 2012: Early-bird registration available through Dec. 15, 2011

Acquiring Data from Storage Media - Provide

The primary purpose of most digital collections is to provide user access to the information.  In order to provide access to your content, you must determine how digital content will be made accessible and who will be allowed to access it.  In order to best provide access to your users, it is important that you understand the needs of the users and their preferred methods for accessing the content you provide.  When necessary, you must also apply appropriate access restrictions.

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

Acquiring Data from Storage Media - Manage

Because the landscape of digital content is continually shifting, it requires constant, active management.  Being responsible for a collection of digital content involves the continued management of not just the technology used to store, preserve, and provide access, but also the monetary and human resources . Managing digital collections requires strong project management skills.  It requires the ability to manage relationships with all of your stakeholders including your user base, those within your department, across departments, your institutional leaders, among other institutions.  As technology changes, it is important to stay abreast of these changes and to be able to adopt new approaches and workflows as necessary.

 

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

Acquiring Data from Storage Media - Protect

It is necessary to take active measures to protect your digital content from loss that can take the form of changes, obsolescence, inappropriate access, and disasters.  One of the first lines of defense is to make multiple copies of your content and store them in different locations.  You should also monitor your content for inadvertent or deliberate changes by performing checksum procedures that can detect and alert you to even the smallest changes in the digital objects in your collection.  You should make plans and policies for addressing obsolescence of storage media, platforms and formats over time. As you are developing your policies during the Prepare stage, you should be sure to also prepare a policy for disaster preparedness.

 

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

Acquiring Data from Storage Media - Store

Once you have selected your digital content, you will have to determine how your content should be stored for the long-term.  When you are planning for the storage of your digital content, you will need to consider cost of storage, quantity of storage devices, level of expertise necessary, partners you have or may want to work with, and the services that you want to build from what you have stored.  Also remember that in order to have accessible, sustainable digital content, you will need to create and store various kinds of metadata either with or linked to your digital content.

 

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

Acquiring Data from Storage Media - Select

After determining what you have, you'll need to select the content to manage and preserve.  As part of this process, you should assess your institutional and departmental mission statements and determine if the content in question fits within your mission.  You will also need to determine if the content has value to your institution, if it's feasible to preserve the content, and if you can provide reliable access of the content to your users.  Once you have determined what content you will accept and process, document the selection choices you have made as well as who is permitted access to the content. 

 

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

Acquiring Data from Storage Media - Identify

An important part of managing your digital collections is identifying what you are working with. This includes identifying what digital content you have, what you are already preserving, and what content you may be acquiring. You will also need to identify the types of media that you have and assess the risk associated with those media types.

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

Acquiring Data from Storage Media - Prepare

There are a variety of things that you can do to prepare.  Many of them involve skills, concepts and tools from the field of digital forensics.

 

Last updated on 12/31/69, 7:00 pm by Anonymous

Managing Data - Identify

Q. What do I need to identify in order to properly manage data?

Many funding agencies, publishers, and sponsoring institutions have issued data management requirements that affect researchers in every disciplinary domain.  Whether considered "big data" or "long-tail" data, researchers are seeking tools and services to help manage their data.  In order to provide them assistance, it is important to first identify the scope of their data management needs.  This includes identifying the disciplinary domain in which the data were generated, the file formats and the software applications required to process those file formats, metadata standards that apply to the data, any risks associated with handling the data, and the policies that have been imposed on the data that may affect certain aspects of managing these particular data.

Last updated on 12/31/69, 7:00 pm by Anonymous

Digitizing - Identify

An important part of managing your digital collections is identifying everything that you are working with. This includes identifying what digital content you have, what you are already preserving, and what content you may be acquiring. You will also need to identify the file formats you have and assess the risk associated with these formats.

 

Take action

  • Identify the content you have, what you are already preserving, and what content you may be acquiring
  • Identify the digital file formats you will be collecting and assess risks associated with these file formats
  • Use file format identification tools to identify file formats you already have in your collection
  • Record date information such as the date the files were received, file creation date, and the date the file was last updated

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

Building IRs - Identify

Q: What do I need to identify in order build an institutional repository?

Identifying all of the pieces and understanding how they fit together is crucial to successfully creating and

Archiving Web Sites - Identify

Q. What do I need to identify in order to archive web sites?

An important part of managing your digital collections is identifying everything with which you are working.  This includes identifying what digital content you have, what you are already preserving, and what content you may be acquiring.  You will also need to identify the file formats you have and assess the risk associated with these formats.  

In cases when the goal is to document the lives of individuals, there are two distinct selection strategies for honing in on materials related to the individuals [Lee]:

  • Work from the individual outward (e.g., ask the person or find information on his/her computer that helps to identify points of entry to his/her online presence, such as logins, browsing histories, and favorite sites). [See Garfinkel and Cox]
  • Work from the wider web inward toward the individual (e.g., use web searches to locate information that leads to elements of his/her web presence).

One of the primary challenges of collecting information about or by given individuals from the Web is “web presence identification” [Bekkerman]—determining what pages on the Web are actually by or about a given individual.

For many institutions, it is important to identify web resources that are "at risk."

Another essential activity can be identify what constitute records to be retained from web sites.

Take action

  • Identify the web content you have, what you are already preserving, and what content you may be acquiring
  • Identify the digital file formats you will be collecting with the web sites and assess risks associated with these file formats
  • Use file format identification tools to identify file formats you already have in your collection
  • Record date information such as the date the files were received, file creation date, file update date


Read

  • Bekkerman, Ron, and Andrew McCallum, "Disambiguating Web Appearances of People in a Social Network," in Proceedings of the 14th International Conference on World Wide Web, WWW 2005: Chiba, Japan, May 10–14, 2005, ed. Allan Ellis and Tatsuya Hagino, 463-70. New York: ACM Press, 2005. http://wwwconference.org/proceedings/www2005/docs/p463.pdf
    "Say you are looking for information about a particular person. A search engine returns many pages for that person's name but which pages are about the person you care about, and which are about other people who happen to have the same name? Furthermore, if we are looking for multiple people who are related in some way, how can we best leverage this social network? This paper presents two unsupervised frameworks for solving this problem: one based on link structure of the Web pages, another using Agglomerative/Conglomerative Double Clustering (A/CDC)|an application of a recently introduced multi-way distributional clustering method. To evaluate our methods, we collected and hand-labeled a dataset of over 1000 Web pages retrieved from Google queries on 12 personal names appearing together in someones in an email folder. On this dataset our methods outperform traditional agglomerative clustering by more than 20%, achieving over 80% F-measure."
  • Collaboration and Transformation Shared Interest Group. "Best Practices Study of Social Media Records Policies." Fairfax, VA: American Council for Technology. March 2011.
    "The purpose of this study is to build a discussion around the use of Web 2.0 collaborative technologies, also known as social media, to help government and its citizens connect more closely, collaboratively, and openly. The study involved interviews at 10 agencies regarding records management processes addressing the use of social media. The C&T SIG sought to explore and capture government best practices of retention policies for social media used to support agency missions."
  • Garfinkel, Simson, and David Cox. "Finding and Archiving the Internet Footprint." Paper presented at the First Digital Lives Research Conference: Personal Digital Archives for the 21st Century, London, UK, February 9-11, 2009. http://simson.net/clips/academic/2009.BL.InternetFootprint.pdf
    "With the move to “cloud” computing, archivists face the increasingly difficult task of finding and preserving the works of an originator so that they may be readily used by future historians. This paper explores the range of information that an originator may have left on computers “out there on the Internet,” including works that are publicly identified with the originator; information that may have been stored using a pseudonym; anonymous blog postings; and private information stored on web-based services like Yahoo Calendar and Google Docs. Approaches are given for finding the content, including interviews, forensic analysis of the originator’s computer equipment, and social network analysis. We conclude with a brief discussion of legal and ethical issues."
  • Koehler, Wallace. "A Longitudinal Study of Web Pages Continued: A Consideration of Document Persistence." Information Research 9, no. 2 (2004). http://informationr.net/ir/9-2/paper174.html
    "It is well established that Web documents are ephemeral in nature. The literature now suggests that some Web objects are more ephemeral than others. Some authors describe this in terms of a Web document half-life, others use terms like 'linkrot' or persistence. It may be that certain 'classes' of Web documents are more or less likely to persist than are others. This article is based upon an evaluation of the existing literature as well as a continuing study of a set of URLs first identified in late 1996. It finds that a static collection of general Web pages tends to 'stabilize' somewhat after it has 'aged'. However 'stable' various collections may be, their instability nevertheless pose problems for various classes of users. Based on the literature, it also finds that the stability of more specialized Web document collections (legal, educational, scientific citations) vary according to specialization. This finding, in turn, may have implications both for those who employ Web citations and for those involved in Web document collection development."
  • Kumar, B.T. Sampath, and Manoj Kumar. "Decay and half-life period of online citations cited in open access journals." International Information & Library Review 44, No. 4 (2012): 202-211. http://dx.doi.org/10.1016/j.iilr.2012.09.002
    "This study investigates the decay and half-life of online citations cited in four open access journals published between 2000 and 2009. A total of 1158 online citations cited in 1086 research articles published in two science and social science journals spanning a period of 10 years (2000–2009) were extracted. Study found that 24.58% (267 out of 1086) of articles had online citations and these articles contained a substantially very less number of online citations (2.98%) compared to previous study results. 30.56% (26% in Science and 52.73% in Social Science) of online citations were not accessible and remaining 69.44% of online citations were still accessible. The ‘HTTP 404 error message-page not found’ was the overwhelming message encountered and represented 67.79% of all HTTP message. Domains associated with .ac and .net had higher successful access rates while .org and .com/.co had lowest successful access rates. The half-life of online citations was computed to be approximately 11.5 years and 9.07 years in Science and Social science journal articles respectively."
  • Lee, Christopher A. "Collecting the Externalized Me: Appraisal of Materials in the Social Web." In I, Digital: Personal Collections in the Digital Era, edited by Christopher A. Lee, 202-238. Chicago, IL: Society of American Archivists, 2011.
    "With the adoption of highly interactive web technologies (frequently labeled “Web 2.0”), forms of individual documentation and expression also often are inherently social and public. Such online environments allow for personal documentation, but they also engage external audiences in ways not previously possible. This opens up new opportunities and challenges for collecting personal materials, particularly within the context of archival appraisal. This chapter explores various ways in which principles of archival appraisal can be operationalized in an environment in which collecting takes the form of submitting queries and following links."
  • McCluskey, Michael. "Website content persistence and change: Longitudinal analysis of pro-white group identity." Journal of Information Science (2012): 1-10.
    "Despite the ability of websites to quickly evolve, little attention has been paid to persistence and change in site content. Longitudinal examination of 163 pro-white advocacy group websites, in which establishing a core group identity is a critical strategic goal, showed a half-life of 2.40 years and 34% remained active after five years. Analysis of text content from 28 sites collected annually from 2007 to 2012 (n=1947) showed that persistence was more likely for advocacy group identity, while examples of group goals were transient. Content persistence trends reflect broader phenomena of ideologically oriented website persuasive material."
  • Mardani, A.H, and M. Sangari. "An Analysis of the Availability and Persistence of Web Citations in Iranian LIS Journals." International Journal of Information Science and Management 11, No. 1 (2013).
    "To discover the current situation and characteristics of web citations accessibility, the present study examined the accessibility of 4,253 web citations in six key Iranian LIS journals published from 2006 to 2010. The proportion percentage of web citations increased from 11% in 2006 to 30% in 2010. The most widely cited top level domains in URLs include the .edu and .org with respectively 37% and 23%. This study provides further evidence that organizations websites have become increasingly vulnerable to URL decay. The results show that only 3467 web citations remain accessible in 2011, of which 71% allowed easy and long-term access to the authors' information intended in URLs. Long time inaccessibility to the authors' intended information was shown to be mostly from URLs that returned the 404 error and also the URLs that had gone through information update. An about 4 year half-life was estimated for Iran's LIS Publications. Ultimately, the results suggest that the decay of URLs is a grave problem in the publication of Iran's LIS researchers and cannot be overlooked. These authors need to gain the necessary knowledge about using web citations as major sources of information for their publications."
  • Moreau, Luc. "The Foundations for Provenance on the Web." Foundations and Trends in Web Science 2, No. 2/3 (2010): 99-241.http://eprints.soton.ac.uk/271691/
    "Using multiple data sources, we have compiled the largest bibliographical database on provenance so far. This large corpus allows us to analyse emerging trends in the research community. Specifically, using the CiteSpace tool, we identify clusters of papers that constitute research fronts, from which we derive characteristics that we use to structure our foundational framework for provenance on the Web. We note that such an endeavour requires a multi-disciplinary approach, since it requires contributions from many computer science sub-disciplines, but also other non-technical fields given the human challenge that is anticipated. To develop our vision, it is necessary to provide a definition of provenance that applies to the Web context. Our conceptual definition of provenance is expressed in terms of processes, and is shown to generalise various definitions of provenance commonly encountered." The "Open Provenance Model is an emerging community-driven representation of provenance, which has been actively used by some twenty teams to exchange provenance information according to the Open Provenance Vision. Having identified an open approach and a model for provenance, we then look at techniques that have been proposed to expose provenance over the Web. We also study how Semantic Web technologies have been successfully exploited to express, query and reason over provenance."
  • Saberi, M.K., and H. Abedi. "Accessibility and decay of web citations in five open access ISI journals", Internet Research 22, No. 2 (2012): 234-247.
    "After acquiring all the papers published by these journals during 2002-2007, their web citations were extracted and analyzed from an accessibility point of view. Moreover, for initially missed citations complementary pathways such as using Internet Explorer and the Google search engine were employed." "The study revealed that at first check 73 per cent of URLs are accessible, while 27 per cent have disappeared. It is notable that the rate of accessibility increased to 89 per cent and the rate of decay decreased to 11 per cent after using complementary pathways. The '.net' domain, with an availability of 96 per cent (a decay of 4 per cent) has the greatest stability and persistence among all domains, while the most stable file format is PDF, with an availability of 93 per cent (a decay of 7 per cent)."
  • U.S. National Archives and Records Administration. Guidance on Managing Records in Web 2.0/Social Media Platforms. October 20, 2010. http://www.archives.gov/records-mgmt/bulletins/ 2011/2011-02.html

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

Digital Curation Librarian

Focus: 
Focus: 

Managing Data - Manage

Q.

Managing Data - Provide

Q.

Managing Data - Protect

Q.

Syndicate content


about seo