Skip to Content

How do I prevent unauthorized access to restricted and/or sensitive data?

Take action

  • Review and understand laws and regulations that govern copyright and protection of research participants
  • De

What type of information do I need for making backups & storage?

Take action

  • In reviewing your options for storing your data collection, consider these six issues* when selecting your storage media:
  1. Longevity - Your chosen media should have a proven life span of at least ten years
  2. Capacity - Make sure that your media can adequately store your the content you have now and the content you plan to collect in the future
  3. Viability - The media you choose should support error detection and data recovery 
  4. Obsolescence - Choose technology that is well established and widely available
  5. Cost - Consider both the cost for purchasing the media and the cost for maintaining it over time
  6. Susceptibility - The media you choose should show a low susceptibility to data loss and physical damage 

* Brown, Adrian.  "Selecting storage media for long-term preservation."  Note 2 in Digital Preservation Guidance.  National Archives, August 2008.  http://www.nationalarchives.gov.uk/documents/selecting-storage-media.pdf

 

Explore

The DataONE website includes a Best Practices guide that offers recommendations on how to effectively store and backup data for long-term preservation.

The UK Data Archive website provides several pages of content directly related to backing up and storing data.  It outlines considerations for the selection of a data storage strategy.

 

Watch

  • Rumsey, Abby Smith. But Storage Is Cheap: Digital Preservation in the Age of Abundance. New Haven, CT: Yale University, 2011. http://youtu.be/Yk9ccNP9xTk.

Abby Smith Rumsey, historian and Consulting analyst on the use of the cultural record in a variety of media, gives her views on digital preservation and the role of research libraries in the preservation of electronic media.

 

Read

  • Baker, Mary, Mehul Shah, David S. H. Rosenthal, Mema Roussopoulos, Petros Maniatis, TJ Giuli, and Prashanth Bungale. “A Fresh Look at the Reliability of Long-term Digital Storage.” In Proceedings of the 2006 EuroSys Conference, 40:221–234. Leuven, Belgium: ACM Press, 2006. doi:10.1145/1217935.1217957.

Emerging Web services, such as email, photo sharing, and web site archives, must preserve large volumes of quickly accessible data indefinitely into the future. The costs of doing so often determine whether the service is economically viable. We make the case that these applications' demands on large scale storage systems over long time horizons require us to reevaluate traditional system designs. 

It is now abundantly clear that researchers must consider the preservation and sharing of their data as a key component of any research effort. The problem that arises, for the researcher and the granting agency, is how to fund and manage such preservation in a sustainable way. Grant funding typically is for projects of limited duration. How can we fund and sustain long-term, indefinite preservation of research data if our grant models involve short-term, limited resourcing? This article proposes a model for doing exactly that. The model can be summed-up with the phrase:  Pay Once, Store Forever (POSF)

This study examined the current provision and future needs of curation of primary research data in the UK, particularly within the e-Science context.  It summarises the strategic and policy analyses and outlines proposals for the organisational structuring of curation provision and provides a table showing which recommendations address the findings.

Challenges someone to create the negative click, positive value repository (which he calls a Research Repository System) and suggests it should contain these elements: web orientation, researcher identity management, authoring support, object disclosure control, data management support, persistent storage, full preservation archive, and spinoffs.

Section on Storing Your Data (pages 17-21) provides best practice for making back-ups, data storage, data security, data transmission and encryption, and data disposal.

Last updated on 12/31/69, 7:00 pm by Anonymous

What models are available for reference?

Take action

  • Review available research data curation models
  • Decide which models are most appropriate for your data management needs

 

Digitizing - Costs

Knowing how much your projects and programs will cost is both the most elusive and one of the most important elements of digital curation practice.  Costs arise around human resources and technical and physical infrastructure.

Take action

  • Plan and budget for your digitization program
  • Consider selection, preparation, metadata creation, preservation, digitization, quality control, technical infrastructure, and  continued maintenance

 

Watch

  • Ayris, Paul.  "A New Digital Republic of Letters?"  YouTube video, 3:52, June 14, 2011. http://www.youtube.com/watch?v=k8rv1lHVQiw 
    Dr. Paul Ayris, University College London and Director and Acting Group Manager at UCL Library Service, President of LIBER, the Association of European Research Libraries, talks about digital preservation and how it can be made economically sustainable.

 

Read

  • Velarde, Daniel, Daniel. (2013). Illusion and Achievement in Open-Access Digitization. Feliciter, 59 Issue, 37-39.

"The article looks at visionary ideas and achievements of open-access (OA) library digitization projects in Canada. According to the author, a national OA library faces barriers including copyright, a digital deficit and the need for a workable business model for reliable service delivery. The author describes Canadian academic Michael Geist's vision of a national OA library. The author notes that public funding is not a sustainable way to support OA."

 

Read

  • Center for Technology in Government. "Opening Gateways: A Practical Guide for Designing Electronic Records Access Programs," 2002.  http://www.ctg.albany.edu/publications/guides/gateways/gateways.pdf
    See especially the cost estimation tool (pp. 29-26??) and the appendix.  -- document doesn't open (CB)
  • RLG Worksheet for Estimating Digital Reformatting Costs.  http://www.oclc.org/research/activities/past/rlg/digimgtools/RLGWorksheet.pdf
    This worksheet is a guide to the preparation of a budget for activities involving digitization.  It can be used for in-house scanning projects or for those utilizing an outside vendor. The activities are organized in eleven steps.
  • Beagrie, Neil, Julia Chruszcz, and Brian Lavoie.  "Keeping Research Data Safe: A Cost Model and Guidance for UK Universities."  April 2008.  http://www.jisc.ac.uk/media/documents/publications/keepingresearchdatasafe0408.pdf
    This study has investigated the medium to long term costs to Higher Education Institutions (HEIs) of the preservation of research data and developed guidance to HEFCE and institutions on these issues.  It has provided an essential methodological foundation on research data costs for the forthcoming HEFCE-sponsored feasibility study for a UK Research Data Service.  It will also assist HEIs and funding bodies wishing to establish strategies and TRAC costings for longterm data management and archiving.
  • Woodyard-Robinson, Deborah.  “Institutional Strategies – Costs and business modelling.”  Chap. 3.7 in Preservation Management of Digital Materials: The Handbook, 61-67.  http://www.dpconline.org/component/docman/doc_download/299-digital-preservation-handbook-digital-preservation-handbook?q=handbook -- link opens PDF file
    Considers costs, labour, object types and storage size, repository boundaries, preservation service level, and timing and provides business models.
  • Nationaal Archief.  "Costs of Digital Preservation."  The Hague: Digital Preservation Testbed, May 2005.  http://www.nationaalarchief.nl/sites/default/files/docs/kennisbank/codpv1.pdf
    Testbed has studied the costs involved in the long term preservation of digital records, drawn up a list of indicators which exert an influence on the total cost of preservation, designed a computational model for the calculation of these costs, and compared the costs involved in the various methods for the creation of digital records and in the various preservation strategies.
  • University College London and the British Library. “LIFE: Life Cycle Information for E-literature.”  [http://www.life.ac.uk/]
    should this be combined with the above entries in use cases? (CB)
  • Chapman, Stephen.  "Counting the Costs of Digital Preservation: Is Repository Storage Affordable?"  Journal of Digital Information 4 no. 2 (May 2003).  http://journals.tdl.org/jodi/article/view/100/99
    Evaluates the fee structures of the Harvard University Library and the Online Computer Library Center, Inc.
  • Sanett, Shelby.  "The Cost to Preserve Authentic Electronic Records in Perpetuity: Comparing Costs across Cost Models and Cost Frameworks."  April 3, 2003.  http://nationalarchives.gov.uk/documents/sanett.ppt -- link opens PowerPoint file. 
    These slides explore issues related to cost modeling and propose a methodology that might be used to evaluate costing frameworks and models, which could then be adapted for use by libraries, archives, museums and other cultural heritage institutions.
  • “Project to Programs: Mainstreaming Digital Imaging Initiatives.”  In Moving Theory into Practice: Digital Imaging for Libraries and Archives, edited by Anne R. Kenney and Oya Reiger, 153-76.  Mountain View, CA: Research Library Group, 2000. 
    This book discusses selection strategies, digital image creation, quality control, image management, use of metadata, rights management, access control, and preservation.

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

Digitizing - Ethics

Explore

  • The Copyright Information Center at Cornelle University

http://copyright.cornell.edu/

  • The Association for Computing Machinery (ACM) copyright page

www.acm.org/usacm/copyright

  • Electronic Privacy Information Center

http://epic.org/

Read

  • Kelley, Michael. (2013). Sounds of Copyright Reform. Library Journal, 138, 8-19.

"The article presents the author's views regarding the preservation of audio recordings, with information on collections in need of restoration according to a report by the U.S. National Recording Preservation Plan released by the Library of Congress (LC). Topics include national plans for the digitization of audio materials, copyright reform, and the legal aspects of sound preservation."

  • Velarde, Daniel. (2013). Illusion and Achievement in Open-Access Digitization. Feliciter, 59, 37-39.

"The article looks at visionary ideas and achievements of open-access (OA) library digitization projects in Canada. According to the author, a national OA library faces barriers including copyright, a digital deficit and the need for a workable business model for reliable service delivery. The author describes Canadian academic Michael Geist's vision of a national OA library. The author notes that public funding is not a sustainable way to support OA."

  • Daigle, Bradley J. (2012). The Digital Transformation of Special Collections. Journal of Library Administration, 52, 44-264.

"The effect of digital technology on special collections has been profound and ongoing. The purpose of this article is to explore the effect born digital materials, digitization, and intellectual property have had on special collections in the 21st century. In particular this study will focus on how archival materials have been significantly transformed by interacting with digital technology—providing both challenges in management and opportunities for new online environments to expose this content worldwide. Finally, a research experiment underway at the University of Virginia Library offers a framework that may help highlight some strategies for exploiting new opportunities going forward." [ABSTRACT FROM PUBLISHER]

  • Bynum, Terrell.  “Computer and Information Ethics.”  Stanford Encyclopedia of Philosophy.  October 23, 2008.  http://plato.stanford.edu/archives/win2008/entries/ethics-computer/
    “Computer and information ethics," in the broadest sense of this phrase, can be understood as that branch of applied ethics which studies and analyzes such social and ethical impacts of ICT.  The present essay concerns this broad new field of applied ethics.
  • Cox, Richard J.  “Teaching, Researching, and Preaching Archival Ethics Or, How These New Views Came to Be.”  Journal of Information Ethics 19 no. 1 (Spring 2010): 20-32.  doi: 10.3172/JIE.19.2.20.  http://mcfarland.metapress.com/content/338266rp375k2178/fulltext.pdf  (subscription required to access this resource)
    Provides definitions of archival ethics and explains how records and archives generate archival issues.  Situates archival ethics within the modern university.  Also provides context for the student essays that follow this article.
  • Fallis, Don.  “Information Ethics for Twenty-First Century Library Professionals.”  Library Hi-Tech 25 no. 1 (2007): 23-36.  doi: 10.1108/07378830710735830.  http://www.emeraldinsight.com/journals.htm?articleid=1597973 (subscription required to access this resource)
    This paper argues for the importance of information ethics to twenty-first century library professionals.  It describes what various authors have said about how information ethics can be applied to the ethical dilemmas faced by library professionals.
  • Floridi, Luciano.  “Information Ethics: An Environmental Approach to the Digital Divide.”  Philosophy in the Contemporary World 9 no. 1 (Spring/Summer 2001): 1-7.  http://www.philosophyofinformation.net/publications/pdf/ieeadd.pdf
    Floridi suggests that in order to create an ethical information society, it is vital to embrace "the fundamental principles of respect for information, its conservation and valorisation.  It must be an ecological ethics for the information environment."
  • Lee, Cal.  “Computer-Supported Elicitation of Curatorial Intent.”  Dagstuhl Seminar Proceedings 10291 (September 20, 2010).  http://www.dagstuhl.de/Materials/Files/10/10291/10291.LeeCal.Paper.pdf
    Proposes the development and testing of computer-supported mechanisms to elicit the curatorial intent of individuals in relation to digital objects being transferred to repositories.

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

Digitizing - Rights Management

Q. What do I need to know about rights management?

 

Take action

 

Review case studies

Documents the process archival staff used in their extensive copyright research and concludes that such a process is not scalable.

 

Watch

Interview with researcher from the Netherlands Institute for Heritage.  In the video, he talks about: work and goals of the Netherlands Institute for Heritage, the copyright issue, the importance for a nation to preserve its cultural heritage, opportunities brought by the Internet to the institutions of preservation, and choice of standards and interoperability.

 

Read

  • Stanford University.  "Copyright Renewal Database."  2006.  http://collections.stanford.edu/copyrightrenewals/
    This database makes searchable the copyright renewal records received by the US Copyright Office between 1950 and 1992 for books published in the US between 1923 and 1963.  Note that the database includes ONLY US Class A (book) renewals.
  • Coyle, Karen.  “Rights in the PREMIS Data Model: A Report for the Library of Congress.”  Washington, D.C.: Library of Congress, December 2006.  http://www.loc.gov/standards/premis/Rights-in-the-PREMIS-Data-Model.pdf
    The PREMIS standard contains a rights entity that allows the association of rights with specific digital preservation actions.  This paper looks at the various definitions of rights, the state of rights metadata, and surveys legislative actions taking place in many nations that will provide a legal standing for digital preservation activities.
  • Section 108 Study Group.  "Executive Summary."  March 2008.  http://www.section108.gov/docs/Sec108ExecSum.pdf
    This Report is addressed first to the Librarian of Congress and the Register of Copyrights, who convened the Study Group.  The conveners intended the work of the group to provide a basis on which legislation could be drafted and recommended to Congress.  This Report summarizes its recommendations, conclusions, and discussions.
  • Library of Congress.  "Copyright Matters: Digitization and Public Access."   http://blogs.loc.gov/copyrightdigitization/
    This blog represents a long term effort to convert non-digital records of copyright ownership and transfers and assignment of rights and to make them widely available online via the web.  The Library of Congress is planning periodic posts with information about plans and progress and welcomes input and comments.
  • "Rights Management."  Chap. 4 in NINCH Guide to Good Practice.  National Initiative for a networked Cultural Heritage, October 2002.  http://www.nyu.edu/its/pubs/pdfs/NINCH_Guide_to_Good_Practice.pdf (click on Chapter IV in the Table of Contents). 
    Focuses on the copyright issues that most concern cultural institutions: how they can legally digitize material in which they may not hold the copyright and how they can ensure that no one else can use the materials they have digitized without their approval (tacit or otherwise).
  • Besek, June M.  "Copyright Issues Relevant to the Creation of a Digital Archive: A Preliminary Assessment."  Council on Library and Information Resources & Library of Congress, January 2003.  http://www.clir.org/pubs/reports/pub112/reports/pub112/pub112.pdf
    This paper describes copyright rights and exceptions and highlights issues potentially involved in the creation of a nonprofit digital archive.
  • Gasaway, Laura N.  "America's Cultural Record: A Thing of the Past?"  Houston Law Review 40 no. 3 (2003): 643-71.  http://www.unc.edu/~unclng/America%27s%20cultural%20record.htm
    This article’s focus on preservation of this record considers whether the commercial interests of copyright proprietors should prevail over the long-term preservation of the nation’s scholarly, cultural and political history.
  • Ryan, Alicia.  "Contract, Copyright, and the Future of Digital Preservation."  Boston University Journal of Science and Technology Law 10 no. 1 (Winter 2004): 152-76.  http://heinonline.org/HOL/Page?handle=hein.journals/jstl10&collection=journals&id=158 (subscription required to access this resource)
    Challenges Congress to create new legal rights for all non-profit libraries and archives so that these organizations can ensure that their best preservation efforts will be legally acceptable.
  • Hirtle, Peter B.  "Digital Preservation and Copyright."  Stanford University Libraries & Academic Information Resources.  http://fairuse.stanford.edu/commentary_and_analysis/2003_11_hirtle.html
    Considers copyright law in the light of making digital preservation copies.
  • Hirtle, Peter.  "Copyright Term and the Public Domain in the United States."  Last updated January 3, 2012.  http://copyright.cornell.edu/resources/docs/copyrightterm.pdf
    This table details U.S. copyright law applications for various types of works as of January 1, 2012.

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

Digitizing - Metadata

In order to boost accessibility and to aid in preservation, you will need to collect and store metadata with the web content you collect.  Look into collecting and storing administrative, structural, and descriptive metadata.

 

Take action

  • Determine the kinds of metadata you will collect
  • Choose the metadata model you will use
  • Decide how to capture and/or create the metadata
  • Cornell University Library.  "Metadata."  Chap. 5 in Moving Theory into Practice: Digital Imaging Tutorial, 2000-2004.  http://www.library.cornell.edu/preservation/tutorial/metadata/metadata-01.html.  Includes a definition, an explanation of the types of metadata and their functions, and a link to a detailed table summarizing the goals, elements, and sample implementations of the three categories of metadata. -- not sure why this is in the Take action section (CB)

 

Read

  • Library of Congress.  "Metadata Encoding & Transmission Standard (METS)."  Last updated June 7, 2012.   http://www.loc.gov/standards/mets/
    Provides technical documentation, community building and news pages for this standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library.
  • Research Libraries Group.  "Descriptive Metadata Guidelines for RLG Cultural Materials."  Mountain View, CA: RLG, 2005.  http://www.oclc.org/research/activities/past/rlg/culturalmaterials/RLG_desc_metadata.pdf
    To ensure that the digital collections submitted to RLG Cultural Materials can be discovered and understood, RLG has compiled these Descriptive Metadata Guidelines for contributors.  While these guidelines reflect the needs of one particular service, they also represent a case study in information sharing across community and national boundaries.
  • "Dublin Core Metadata Initiative."  Last updated June 14, 2012.  http://dublincore.org/
    Includes sections on Metadata Basics and DCMI Specifications.
  • Research Libraries Group. PREMIS Web Site. (PREservation Metadata Implementation Strategies). [http://www.oclc.org/research/activities/past/orprojects/pmwg/default.htm] -- not sure if this is necessary with the contents linked separately as follows (CB)
  • PREMIS Editorial Committee.  "PREMIS Data Dictionary for Preservation Metadata."  Version 2.0, March 2008.  http://www.loc.gov/standards/premis/v2/premis-2-0.pdf
    Includes a Data Dictionary, a comprehensive, practical resource for implementing preservation metadata in digital archiving systems that was largely based on a survey of about 70 organizations thought to be active in or interested in digital preservation.  The initial goal was creating an implementable set of "core" preservation metadata elements, with broad applicability within the digital preservation community.
  • Library of Congress.  "PREMIS Maintenance Activity."  Last updated June 11, 2012.  http://www.loc.gov/standards/premis/
    This official web site maintains links to the original documentation and also posts updates and changes for PREservation Metadata: Implementation Strategies.
  • The Library of Congress Technical Standards for Digital Conversion of Text and Graphic Materials. December, 2006

http://memory.loc.gov/ammem/about/techStandards.pdf

  • Government Printing Office (GPO) FDsys Operational Specification for Converted Content (Version 3.3). February, 2006

http://www.fdlp.gov/home/repository/doc_download/821-gpos-digitization-specification-33-final

  • Sutton, Shan C. (2012). Balancing Boutique-Level Quality and Large-Scale Production: The Impact of "More Product, Less Process" on Digitization in Archives and Special Collections. RBM: A Journal of Rare Books, Manuscripts, & Cultural Heritage, 13, 50-63.

"The article examines the influence of More Products, Less Process (MPLP) on digitization practices and discusses how a major digitization project involving the John Muir Papers had decision making that reflects elements of the MPLP philosophy. It further analyzes how the minimalist metadara practices supported by MPLP require careful implementation with the context of evolving user expectations for locating information on the Web. It cites the relevance of MPLP to digitization in making user access paramount, embracing flexibility and expending greatest effort on the most deserving materials. It notes how the Muir Papers project shows how different series in the same collection can legitimately merit differing levels of image capture and description."

  • Eklund, Janice L. (2011). Cultural Objects Digitization Planning: Metadata Overview. Visual Resources Association Bulletin, 38, 1-19.

"This document offers an overview of image metadata types, applications, and best practice considerations for planning cultural object digitization projects." [ABSTRACT FROM AUTHOR]

  • Zhang, Jane, and Mauney, Dayne. (2013). When Archival Description Meets Digital Object Metadata: A Typological Study of Digital Archival Representation. American Archivist, 76, 174-195.

"The relationship between archival description and descriptive metadata of digital objects has not been explicitly discussed in the literature. The discussion will enhance our understanding of the relationship between archival context and digital content, a significant topic in a networked digital environment. The data collected in this study show that archivists have made conscious efforts to build connections between archival description (context) and digital items (content), and, as a result, distinct representation models have emerged from digital archival practice. However, at the level of integration of archival context and digital content in digital archival representation, archivists are challenged to achieve an ultimate goal of making digital archives more accessible and better contextualized in the digital world." [ABSTRACT FROM AUTHOR]

 

  • NDSAB Digitization Standards Charter (LOC)

Federal Agency Digitization Guidelines Still Image Digitization Working Group Charter July 17, 2008
http://www.digitizationguidelines.gov/guidelines/StillImageCharter.pdf

  • Guidelines: Technical Guidelines for Digitizing Cultural Heritage Materials (Still Image Working Group)

"The Technical Guidelines for Digitizing Cultural Heritage Materials that can be reproduced by still images represents shared best practices followed by agencies participating in the Federal Agencies Digitization Guidelines Initiative (FADGI).  This group is involved in a cooperative effort to develop common digitization guidelines for still image materials (such as textual content, maps, and photographic prints and negatives) found in cultural heritage institutions. This document draws substantially on the National Archives and Records Administration’s Technical Guidelines for Digitizing Archival Records for Electronic Access: Creation of Production Master Files – Raster Images (June 2004), but has been revised and updated in several areas to reflect the current recommendations of the working group and to reflect changes that have occurred in the digitization field since 2004. These Guidelines were prepared by members of the working group during the winter of 2009-2010. Readers will find updated sections covering equipment and image performance metrics, quality management, and metadata in this revision."
http://www.digitizationguidelines.gov/guidelines/digitize-technical.html

 

  • "Metadata."  Appendix B in NINCH Guide to Good Practice.  National Initiative for a Networked Cultural Heritage, October 2002. 

Provides a detailed description of the kinds of metadata and metadata standards that are of greatest importance to the cultural heritage sector.
http://www.nyu.edu/its/humanities/ninchguide/appendices/metadata.html (click on Appendix B in the Table of Contents). 

 

  • Gill, Tony, Anne J. Gilliland, Maureen Whalen, and Mary S. Woodley.  "Introduction to Metadata: Setting the Stage."  Los Angeles, CA: Getty Research Institute, 2008.  Online Edition, Version 3.0 

Gives a general introduction to metadata and explains some of the key tools, concepts, and issues associated with using metadata to build authoritative, reliable, and useful digital resources.
http://www.getty.edu/research/publications/electronic_publications/intrometadata/

  • "Understanding Metadata."  Bethesda, MD: NISO Press, 2004.  

Defines metadata and provides an overview of the various metadata schemes and element sets.
http://www.niso.org/publications/press/UnderstandingMetadata.pdf

 

  • Lagoze, Carl and Sandra Payette.  “Metadata: Principles, Practices, & Challenges.”  Chap. 5 in Moving Theory into Practice: Digital Imaging for Libraries and Archives, edited by Anne R. Kenney and Oya Reiger, 84-100.  Mountain View, CA: Research Library Group, 2000.
This book discusses selection strategies, digital image creation, quality control, image management, use of metadata, rights management, access control, and preservation.

 

This report focuses on looking for criteria for distinguishing preservation metadata from other forms of metadata at a level somewhere above the descriptive/structural/administrative distinction.

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

Digitizing - Reasons

Q. Why should I digitize?

The two most frequent reasons that physical collections are digitized are to provide more widespread access to the items, and to help preserve these items by reducing their handling. Digitization can make "invisible" collections visible, even if only finding aids are digitized and not the entire collections. Beyond discovery, digitization can provide the data for new types of scholarship and analysis. Researchers can use new forms of analysis on digitized text, sounds, images, and video resulting in scholarly use not possible with analog collections. Digitization also fosters integration of related materials from multiple institutions in virtual collections.

While the idea that by creating digital surrogates to books and other physical information items, the information will be preserved in perpetuity persists, this is not true and not a reason for digitizing analog materials. Because digital objects carry with them their own preservation challenges, stewards of digital materials must grapple with these issues in addition to preserving the analog originals. The costs of preserving digitized objects must be seen in addition to, rather than a replacement of, the costs of preserving analog materials.

 

Take action

  • Create a strategic policy document that clearly states the your institution's reasons for conducting digitization projects.
  • Establish criteria for selecting materials to be digitized, including costs and benefits.

 

Watch

  • Puntoni, Pedro.  "Public Institutions Must Ensure Public Spaces on the Internet."  FLi Multimidia, 9:39, April 2010.  http://vimeo.com/13020267

     

    Interview with historian and coordinator of the Brasiliana USP project.

Read

  • Conway, Paul.  "Overview: Rationale for Digitization and Preservation."  Handbook for Digital Projects: A Management Tool for Preservation and Access. Andover, MA: NEDCC, 2000.  Last modified May 13, 2003.  http://www.nedcc.org/resources/digitalhandbook/ii.htm 
    This chapter provides a foundation for understanding the preservation implications of digital conversion projects. Following a brief description of the advantages and disadvantages of digital technologies, the author defines preservation in the digital context and describes how the underlying principles of traditional preservation practice relate to the creation of digital products.
  • Coyle, Karen.  "Mass Digitization of Books."  Journal of Academic Librarianship 32 no. 6 (2006): 641-645. http://www.kcoyle.net/jal-32-6.html 
    Compares mass digitization with non-mass digitization and large-scale digitization and discusses the issues involved in mass digitization.
  • Grindley, Neil.  "Saving for the Future."  Research Information (February/March 2009).  http://www.researchinformation.info/features/feature.php?feature_id=205 
    Describes the importance of preserving digital information and some of the major projects that are helping with this.
  • Johnson, Richard.  "In Google's Broad Wake: Taking Responsibility for Shaping the Global Digital Library."  Special issue, ARL 250 (February 2007): 1-15.  http://www.arl.org/bm~doc/arlbr250digprinciples.pdf 
    Analyzes the agreements for digital materials as negotiated by Google and seven other organizations.
  • Smith, Abby.  "Why Digitize?"  Council on Library and Information Resources, February 1999.  http://www.clir.org/pubs/reports/pub80-smith/pub80.html 
    This paper was written in response to discussions of digitization at meetings of the National Humanities Alliance (NHA).  NHA asked the Council on Library and Information Resources (CLIR) to evaluate the experiences of cultural institutions with digitization projects to date and to summarize what has been learned about the advantages and disadvantages of digitizing culturally significant materials. This report remains a classic statement as to why institutions should digitize materials and what issues they need to consider.
  • Tibbo, Helen R.  "On the Nature and Importance of Archiving in the Digital Age."  Advances in Computing 57 (2003): 1-67.  http://pdn.sciencedirect.com.libproxy.lib.unc.edu/science?_ob=MiamiImageURL&_cid=277250&_user=130907&_pii=S0065245803570012&_check=y&_origin=browse&_zone=rslt_list_item&_coverDate=2003-12-31&wchp=dGLzVlS-zSkWA&md5=ed37a4cda94740dfec21b8e978f0506f&pid=1-s2.0-S0065245803570012-main.pdf (requires UNC Onyen for access) 
    Argues that archiving, and the preservation tools to facilitate it, must become ubiquitous if society is to preserve its memory in the digital age.

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

Digital Curation Glossaries

Below are several helpful digital curation glossaries.

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

OLD: Managing Data - Description

Q. How do I describe data?

Having informative descriptions is deeply valuable in providing a key to accessing, understanding, and using the datasets you are managing.  The best practice is to educate data creators and to have them provide the necessary description of their data.  If the data you are managing does not have adequate description, and you are not in contact with its creators, you may have to describe the data in-house to the best of your and your staff's abilities.

 

Take action

Use tools that work best for your project from cases studies and below.  Confusing description -- these aren't case studies.  Plus other pages don't have this kind of direction below the section heading. (CB)

 

Review use cases

  • DeRoure, D. and J.A. Hendler.  "E-Science: the Grid and the Semantic Web."  IEEE Intelligent Systems 19 no. 1 (Jan/Feb 2004): 65-71.  http://ieeexplore.ieee.org/iel5/5254/28315/01265888.pdf 
    Over the past few years, researchers have been treated to two visions of the Internet's future. One is the Semantic Web, the next generation of World Wide Web technology.  The second is grid computing, the next generation of internetworked processing.  The Semantic Web is described as "an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation."  Grid computing is defined as "flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources."  It discusses their differences and, more importantly, their similarities, and explores the work needed to bring the two together.  It focuses particularly on scientists' needs, an area in which the high-power computing made possible by grid computing and the large-scale, distributed information management enabled by Semantic Web technologies will need to be integrated.  In particular, this enables new approaches to interdisciplinary scientific endeavors made possible by these new technologies.
  • ArcGIS Data Interoperability  http://www.esri.com/software/arcgis/extensions/datainteroperability
    ArcGIS Data Interoperability eliminates barriers to data sharing by providing state-of-the-art direct data access; data translation tools; and the ability to build complex spatial extraction, transformation, and loading (ETL) processes.  ArcGIS Data Interoperability allows you to use any standard GIS data, regardless of format, within the ArcGIS for Desktop environment for mapping, visualization, and analysis.  The Workbench application, included with the extension, enables you to build complex spatial ETL tools for data validation, migration, and distribution.  This seems to be a software program rather than a use case (CB).
  • National Institute for Standards and Technology.  Last updated October 2000.  http://physics.nist.gov/cuu/Units/  
    not clear how this relates to description (CB)
  • Data Conservancy  http://dataconservancy.org/ 
    The Data Conservancy is a growing community that is developing software using industry best practices, the open archival information system (OAIS) reference model, service-oriented architecture (SOA), meta-data standards, and much more.

 

Watch

  • Data Conservancy: A Web Science View of Data Curation.  YouTube, 1:03.  Posted by Library of Congress.  December 17, 2010.  http://www.youtube.com/watch?v=fDOSsKQBNyg 
    The Data Conservancy embraces a shared vision: data curation is not an end, but rather a means to collect, organize, validate and preserve data to address grand research challenges.  Sayeed Choudhury provides an overview of the Data Conservancy with an emphasis on the data framework aspects of the project.

 

Read

  • Hook, Les A., Suresh K. Santhana Vannan, Tammy W. Beaty, Robert B. Cook, and Bruce E. Wilson.  "Best Practices for Preparing Environmental Data Sets to Share and Archive."  September 2010.  http://daac.ornl.gov/PI/BestPractices-2010.pdf 
    Read pages 12-21: 2.2. Use Consistent Data Organization, Use Consistent File Structure, 2.3. Assign Descriptive File Names and Stable File Formats For Tabular and Image Data, and 2.4. Assign Descriptive File Names.
  • Abbott, Daisy.  "Interoperability."  Digital Curation Centre, February 4, 2009.  http://www.dcc.ac.uk/resources/briefing-papers/introduction-curation/interoperability 
    Sections include: short-term benefits and long-term value, examples of interoperability in practice, higher/further education perspective, e-Science perspective, issues to be considered, and additional resources.
  • Borer, Elizabeth T., Eric W. Seabloom, et al.  "Some Simple Guidelines for Effective Data Management." Bulletin of the Ecological Society of America (April 2009): 205-14.  http://www.esajournals.org/doi/pdf/10.1890/0012-9623-90.2.205 
    This document provides some simple guidelines for effective data management, which, if put into practice, will benefi t the original data owner as well as enhance prospects for the long-term preservation and re-use of the data by other researchers.  The rules of thumb presented here facilitate rapid and accurate interpretation of the data, and pre-dispose the data to more effective processing by computers.
  • Lord, Philip and Alison Macdonald.  "e-Science Curation Report: Data curation for e-Science in the UK: an audit to establish requirements for future curation and provision."  Joint Information Systems Committee, 2003.  http://www.jisc.ac.uk/uploaded_documents/e-ScienceReportFinal.pdf 

    This study examined the current provision and future needs of curation of primary research data in the UK, particularly within the e-Science context.  It summarises the strategic and policy analyses and outlines proposals for the organisational structuring of curation provision and provides a table showing which recommendations address the findings.

& from Organizing Data Section:

I don't understand this formatting (CB)

  • Van den Eynden, Veerle, Louise Corti, et al.  "Managing and sharing data."  Colchester: UK Data Archive, 2011.  http://www.data-archive.ac.uk/media/2894/managingsharing.pdf

    Section on Organising Files and Folders (pages 13-14) summarises best practice.

  • Barkstrom, Bruce and Mike Folk.  "Attributes of file formats for long-term preservation of scientific and engineering data in digital libraries."  Joint Council on Digital Libraries, 2003.  http://www.ncsa.uiuc.edu/NARA/Sci_Formats_and_Archiving.doc -- opens a Word document 
    This paper describes the need to consider how file formats affect the ease and effectiveness with which scientific and engineering data may be stored and accessed in long term archives.  They identify a number of attributes of file formats that can help or hinder them as candidates for long-term digital preservation and consider how these attributes appear to a number of different audiences for long-term archiving.
  • Digital Curation Centre.  "Standard Naming Conventions."  Last updated February 28, 2012.  http://www.dcc.ac.uk/resources/external/standard-naming-conventions-electronic-records 
    Written by the Records Management Section at the University of Edinburgh, this document provides a common set of rules to apply to the naming of electronic records.  The conventions are primarily intended for use with Windows-based software and documents such as word-processed documents, spreadsheets, presentations, e-mails and project plans.  "File names" are the names that are listed in the file directory and that users give to new files when they save them for the first time.  The conventions assume that a logical directory structure or filing scheme is in place and that similar conventions are used for naming the levels and folders within the directory structure.
  • Davidson, Joy.  "Placing our stuff so we can find it later: A meta-learning essential."  Digital Curation Centre. April 10, 2006.  http://www.dcc.ac.uk/news/placing-our-stuff-so-we-can-find-it-later-meta-learning-essential 
    Learning from our previous work is often inhibited by difficulties in finding relevant materials after a period of time and, when found, making sense of them.  Presented here are several practical approaches for alleviating this difficulty.  The suggestions are (1) craft meaningful, contextual file names; (2) place things where they can be easily found; and, (3) relentlessly discard useless items.  It is widely accepted that meta-learning is a matter of acquiring skills.  This paper suggests how three specific skills related to using computing technology can be improved, thus enabling enhanced learning for these populations.  Following this discussion, the paper will link back up with a meta-learning perspective on the value of improving these skills.

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

Archiving Web Sites - Preservation Strategies

Q. How can I preserve web content that I collect?

 

Society of American Archivists Annual Meeting: Archives 360

Focus: 
Focus: 
Focus: 
Date: 
Sunday, August 21, 2011 (All day) - Saturday, August 27, 2011 (All day)

ARCHIVES 360° is the premier educational event of the year for archives professionals, featuring:

How do I ensure data quality?

Take action

 

Review use case studies and resources

  • Chapman, Arthur.  "Principles of Data Quality."  Version 1.0. in Report for the Global Biodiversity Information Facility, Copenhagen, 2005.  http://imsgbif.gbif.org/CMS_ORC/?doc_id=1229&download=1 -- link opens a PDF file
    This paper was commissioned from Arthur Chapman in 2004 by the GBIF DIGIT programme to highlight the importance of data quality as it relates to primary species occurrence data.
  • United States Environmental Protection Agency.  "Guidance on Environmental Data Verification and Data Validation."  November 2002.  http://www.epa.gov/QUALITY/qs-docs/g8-final.pdf 
    The U.S. Environmental Protection Agency (EPA) has developed an Agency-wide program of quality assurance for environmental data.  Data verification and data validation are important steps in the project life cycle, supporting its ultimate goal of defensible products and decisions.  This guidance document, provides practical advice to individuals implementing these steps.

 

Read

  • Van den Eynden, Veerle, Louise Corti, et al.  "Managing and sharing data."  Colchester, UK Data Archive, 2011.  http://www.data-archive.ac.uk/media/2894/managingsharing.pdf

    Section on Quality Assurance (page 14) lists quality control measures for data collection and processing.

  • Edwards, D.  "Data Quality Assurance."  In William K. Michener and James W. Brunt, eds. Ecological Data: design, management, and processing.  Oxford: Blackwell Science, 2000.  Pages 70-91.
    can't find in order to annotate (CB)
  • Hook, Les A., Suresh K. Santhana Vannan, Tammy W. Beaty, Robert B. Cook, and Bruce E. Wilson.  "Best Practices for Preparing Environmental Data Sets to Share and Archive."  September 2010.  http://daac.ornl.gov/PI/BestPractices-2010.pdf 

    Read pages 12-14: 2.2 Use Consistent Data Organization.

Last updated on 12/31/69, 7:00 pm by Anonymous

How do I organize data?

Take action

Use resources to ensure good data organization.  For more details see another Digital Curation Exchange page:

 

Read

  • Van den Eynden, Veerle, Louise Corti, et al.  "Managing and sharing data."  Colchester: UK Data Archive, 2011.  http://www.data-archive.ac.uk/media/2894/managingsharing.pdf

    Section on Organising Files and Folders (pages 13-14) summarises best practice.

  • Barkstrom, Bruce and Mike Folk.  "Attributes of file formats for long-term preservation of scientific and engineering data in digital libraries."  Joint Council on Digital Libraries, 2003.  http://www.ncsa.uiuc.edu/NARA/Sci_Formats_and_Archiving.doc -- opens a Word document 

    This paper describes the need to consider how file formats affect the ease and effectiveness with which scientific and engineering data may be stored and accessed in long term archives.  They identify a number of attributes of file formats that can help or hinder them as candidates for long-term digital preservation and consider how these attributes appear to a number of different audiences for long-term archiving.

  • Digital Curation Centre.  "Standard Naming Conventions."  Last updated February 28, 2012.  http://www.dcc.ac.uk/resources/external/standard-naming-conventions-electronic-records 

    Written by the Records Management Section at the University of Edinburgh, this document provides a common set of rules to apply to the naming of electronic records.  The conventions are primarily intended for use with Windows-based software and documents such as word-processed documents, spreadsheets, presentations, e-mails and project plans.  "File names" are the names that are listed in the file directory and that users give to new files when they save them for the first time.  The conventions assume that a logical directory structure or filing scheme is in place and that similar conventions are used for naming the levels and folders within the directory structure.

  • Davidson, Joy.  "Placing our stuff so we can find it later: A meta-learning essential."  Digital Curation Centre. April 10, 2006.  http://www.dcc.ac.uk/news/placing-our-stuff-so-we-can-find-it-later-meta-learning-essential 

    Learning from our previous work is often inhibited by difficulties in finding relevant materials after a period of time and, when found, making sense of them.  Presented here are several practical approaches for alleviating this difficulty.  The suggestions are (1) craft meaningful, contextual file names; (2) place things where they can be easily found; and, (3) relentlessly discard useless items.  It is widely accepted that meta-learning is a matter of acquiring skills.  This paper suggests how three specific skills related to using computing technology can be improved, thus enabling enhanced learning for these populations.  Following this discussion, the paper will link back up with a meta-learning perspective on the value of improving these skills.

  • Hook, Les A., Suresh K. Santhana Vannan, Tammy W. Beaty, Robert B. Cook, and Bruce E. Wilson.  "Best Practices for Preparing Environmental Data Sets to Share and Archive."  September 2010.  http://daac.ornl.gov/PI/BestPractices-2010.pdf 

    Read pages 12-14: 2.2. Use Consistent Data Organization.

Last updated on 12/31/69, 7:00 pm by Anonymous

How do I avoid file format obsolescence?

Take action

  • Determine whether or not the data file format is suitable for long-term preservation
  • Identify the tools and

Which metadata standard should I use to manage data?

In the context of data management, metadata are a subset of core standardized and structured data documentation that explains the origin, purpose, time reference, geographic location, creator, acce

OLD: Managing Data - Authenticity

Q. What information do I need for version control and authenticity?

Just as quality control activities check that the data is valid, authenticity and version control activities are undertaken to ensure the data is what you say it is.  You may have to make changes to the data or migrate it to a current file format, but you can maintain its authenticity by tracking and documenting all of the changes and versions over time. 

 

Take action

  • Decide what versions to keep and record version
  • Track location of files
  • Keep a master file of data
  • Create a version control table

 

Review use cases and resources

  • MacNeil, Heather, et al.  "Authenticity Task Force Report."  InterPARES (2002).  http://www.interpares.org/book/interpares_book_d_part1.pdf

    Describes two sets of requirements: one includes requirements that support the presumption of the authenticity of electronic records before they are transferred to the preserver’s custody; the other includes requirements that support the production of authentic copies of electronic records after they have been transferred to the preserver’s custody.

  • CASPAR  http://www.casparpreserves.eu/

    CASPAR: Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval is an Integrated Project co-financed by the European Union within the Sixth Framework Programme.  For the programme goals, see http://www.casparpreserves.eu/caspar-project.html.

  • Data Life Cycle - Authenticity and Integrity. From Digital Curation Centre. [Link] -- not sure what this is supposed to link to (CB)

 

Read

  • Van den Eynden, Veerle, Louise Corti, et al.  "Managing and sharing data."  Colchester, UK Data Archive, 2011.  http://www.data-archive.ac.uk/media/2894/managingsharing.pdf

    Section on Quality Assurance (page 14) lists quality control measures for data collection and processing.

  • DataONE.  "Provide version information for use and discovery."  http://www.dataone.org/best-practices/provide-version-information-use-and-discovery

    Outlines best practice for versioning data products.

  • Bearman, David and Jennifer Trant.  "Authenticity of Digital Resources: Towards a Statement of Requirements in the Research Process."  D-Lib Magazine 6 (1998).  http://www.dlib.org/dlib/june98/06bearman.html

    Calls for further definition of requirements for digital authenticity and the associated assessment of mechanisms being offered in order to hasten the development of trusted and widely adopted solutions.

  • Cullen, Charles T. et al.  "Authenticity in a Digital Environment."  Washington, DC: Council on Library and Information Resources, May 2000.  http://www.clir.org/pubs/reports/pub92/pub92.pdf
    Arose from a project intended to begin a discussion among different communities that have a stake in the authenticity of digital information and to create a common understanding of key concepts surrounding authenticity and of the terms various communities use to articulate them.
  • Hook, Les A., Suresh K. Santhana Vannan, Tammy W. Beaty, Robert B. Cook, and Bruce E. Wilson.  "Best Practices for Preparing Environmental Data Sets to Share and Archive."  September 2010.  http://daac.ornl.gov/PI/BestPractices-2010.pdf 

    Read pages 27-29: 2.7 Provide Data Set Documentation and Metadata.

  • Lord, Philip and Alison Macdonald.  "e-Science Curation Report: Data curation for e-Science in the UK: an audit to establish requirements for future curation and provision."  Joint Information Systems Committee, 2003.  http://www.jisc.ac.uk/uploaded_documents/e-ScienceReportFinal.pdf 

    This study examined the current provision and future needs of curation of primary research data in the UK, particularly within the e-Science context.  It summarises the strategic and policy analyses and outlines proposals for the organisational structuring of curation provision and provides a table showing which recommendations address the findings.

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

What file formats will I need to be familiar with?

Q. What file formats will I need to be familiar with?

Digital data can come hundreds of different file formats. Being familiar with as many file formats as you can will help you best manage data sets that are in your care. 

 

Take action

  • Determine what formats you will be working with
  • Become familiar with digital data file formats -- confusing formatting here (presumably what follows are the various file formats, but that's not how it seems visually) (CB)

Quantitative tabular data with extensive metadata: a dataset with variable labels, code labels, and defined missing values, in addition to the matrix of data

  • SPSS portable format (.por)
  • delimited text and command (‘setup’) file (SPSS, Stata, SAS, etc.) containing metadata information
  • some structured text or mark-up file containing metadata information, e.g. DDI XML file

Quantitative tabular data with minimal metadata: a matrix of data with or without column headings or variable names, but no other metadata or labelling

  • comma-separated values (CSV) file (.csv)
  • tab-delimited file (.tab)
  • including delimited text of given character set with SQL data definition statements where appropriate

Geospatial data: vector and raster data

  • ESRI Shapefile (essential: .shp, .shx, .dbf ; optional: .prj, .sbx, .sbn)
  • geo-referenced TIFF (.tif, .tfw)
  • CAD data (.dwg)
  • tabular GIS attribute data

Qualitative data: textual

  • eXtensible Mark-up Language (XML) text according to an appropriate Document Type Definition (DTD) or schema (.xml)
  • Rich Text Format (.rtf)
  • plain text data, ASCII (.txt)

Digital image data

  • TIFF version 6 uncompressed (.tif)

Digital audio data

  • Free Lossless Audio Codec (FLAC) (.flac)

Digital video data

  • MPEG-4 (.mp4) motion
  • JPEG 2000 (.jp2)

Documentation

  • Rich Text Format (.rtf)
  • PDF/A or PDF (.pdf)
  • OpenDocument Text (.odt)

 

Review use cases

  • Arc/Info Binary Coverage Format Analysis.  Last updated June 14, 2006.  http://avce00.maptools.org/docs/v7_bin_cover.html 
    This is an attempt to document the binary vector coverage files used by Arc/Info V7.x for Unix and Windows NT.
  • Arc/Info Export (E00) Format Analysis.  Last updated February 24, 2000.  http://avce00.maptools.org/docs/v7_e00_cover.html 
    This is an updated version of the (world famous) "ANALYSIS OF ARC EXPORT FILE FORMAT FOR ARC/INFO (REV 6.1.1)."
  • JHOVE - JSTOR/Harvard Object Validation Environment  http://hul.harvard.edu/jhove/ 
    JHOVE provides functions to perform format-specific identification, validation, and characterization of digital objects.
  • PRONOM: The Technical Registry http://www.nationalarchives.gov.uk/PRONOM/Default.aspx

    PRONOM is a resource for anyone requiring impartial and definitive information about the file formats, software products and other technical components required to support long-term access to electronic records and other digital objects of cultural, historical or business value.

 

Read

  • Abrams, Stephen.  "File Formats."  Digital Curation Centre, October 2007.  http://www.dcc.ac.uk/sites/default/files/documents/resource/curation-manual/chapters/file-formats/file-formats.pdf   
    The DCC Digital Curation Manual instalments provide detailed and practical information aimed at digital curation practitioners. They are designed to assist data creators, curators and re-users to better understand and address the challenges they face and to fulfil the roles they play in creating, managing, and preserving digital information over time.  Each instalment will place the topic on which it is focused in the context of digital curation by providing an introduction to the subject, case studies, and guidelines for best practice.
  • Van den Eynden, Veerle, Louise Corti, et al.  "Managing and sharing data."  Colchester, UK Data Archive, 2011.  http://www.data-archive.ac.uk/media/2894/managingsharing.pdf

    Section on Formatting Your Data (pages 11-13) lists recommended file formats.

  • Florida Center for Library Automation.  "Recommended Data Formats for Preservation Purposes.  http://fclaweb.fcla.edu/uploads/recFormats.pdf 
    This table is intended to help Florida university administrators develop guidelines for preparing and submitting files to the Florida Digital Archive.
  • Rog, Judith and Carolina van Wijk.  "Evaluating File Formats for Long-term Preservation.”  National Library of the Netherlands, 2007.  http://www.kb.nl/sites/default/files/docs/KB_file_format_evaluation_method_27022008.pdf

    Describes the quantifiable file format risk assessment method, which can be used to define digital preservation strategies for specific file formats, and intends to inspire other cultural heritage institutions to define their own quantifiable file format evaluation method.

  • Arms, Carolyn and Carl Fleischhauer.  "Digital Formats: Factors for Sustainability, Functionality and Quality."  Proceedings Society for Imaging Science and Technology, 2005.  http://memory.loc.gov/ammem/techdocs/digform/Formats_IST05_paper.pdf 
    The Library of Congress is drafting a decision-support framework pertaining to the preservation of digital content.  The framework is presented through a Web site that identifies and documents digital content formats that are promising (or unpromising) for long-term sustainability, together with some explanatory essays.
  • DataONE.  "Document and store data using stable file formats."  http://www.dataone.org/best-practices/document-and-store-data-using-stable-file-formats

    Outlines best practice for file formats.

  • Lord, Philip and Alison Macdonald.  "e-Science Curation Report: Data curation for e-Science in the UK: an audit to establish requirements for future curation and provision."  Joint Information Systems Committee, 2003.  http://www.jisc.ac.uk/uploaded_documents/e-ScienceReportFinal.pdf 

    This study examined the current provision and future needs of curation of primary research data in the UK, particularly within the e-Science context.  It summarises the strategic and policy analyses and outlines proposals for the organisational structuring of curation provision and provides a table showing which recommendations address the findings.  Pages 31-34 include section 4.10 Heterogeneity and categories of data.

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

Archiving Web Sites - Prepare

Q. How should I prepare to archive web sites?

Before you dive head first into collecting and preserving web sites, it is important that you take some time to assess your current situation. Take a look at what web content you may already have collected and what content you are considering to add to your collection. Make sure you understand all aspects of web archiving – including the human resources, technology, and costs before you begin. It may also be a good idea to understand some of the history of the Web and basics about how the Web works.

Take action

  • Establish the monetary, human, and technological resources you need and what you have available
  • Perform needs and resource assessments
  • Prepare clearly defined policies for all web archiving processes
  • Review use cases, watch videos, and read literature to gain a greater understanding of web archiving principles

Review Examples of Web Archive Collections

  • Center for History and New Media and American Social History Project/Center for Media and Learning.  The September 11 Digital Archive.  http://911digitalarchive.org/ 

    The September 11 Digital Archive uses electronic media to collect, preserve, and present the history of September 11, 2001 and its aftermath. The Archive contains more than 150,000 digital items, a tally that includes more than 40,000 emails and other electronic communications, more than 40,000 first-hand stories, and more than 15,000 digital images.  In September 2003, the Library of Congress accepted the Archive into its collections, an event that both ensured the Archive's long-term preservation and marked the library's first major digital acquisition.

  • Federal Web Harvests. U.S. National Archives and Records Administration. http://webharvest.gov/collections/
    The National Archives and Records Administration (NARA) preserved a one-time snapshot of agency public web sites as they existed on or before January 20, 2001, as an archival record in the National Archives of the United States. NARA also conducted a harvest (i.e., capture) of Federal Agency public web sites in 2004 and of Congressional web sites in 2006, 2008 and 2010. In January 2005, NARA issued "Guidance on Managing Web Records," which addresses agencies' responsibilities for identifying, managing and scheduling web materials they identify as Federal records. Accordingly, each agency is now responsible, in coordination with NARA, for determining how to manage its web records, including whether to preserve a periodic snapshot of its entire web page.
  • Internet Archive. http://archive.org/index.php 
    The Internet Archive, a 501(c)(3) non-profit, is building a digital library of Internet sites and other cultural artifacts in digital form.  Like a paper library, it provide free access to researchers, historians, scholars, and the general public.
  • Lecher, Hanno E. "Small Scale Academic Web Archiving: DACHS." In Web Archiving, edited by Julien Masanès, 213-25. New York, NY: Springer, 2006.
    "The main objectives of the DACHS2 are to identify and archive Internet resources relevant for Chinese Studies in order to ensure their long-term accessibility. Selection plays an important role in this process, and special emphasis is put on social and political discourse as reflected by articulations on the Chinese Internet."
  • Library of Congress Web Archives (LCWA). http://lcweb2.loc.gov/diglib/lcwa/
    The Library of Congress Web Archives (LCWA) is composed of collections of archived web sites selected by subject specialists to represent web-based information on a designated topic.  It is part of a continuing effort by the Library to evaluate, select, collect, catalog, provide access to, and preserve digital materials for future generations of researchers.  The early development project for Web archives was called MINERVA.
  • Library of Congress.  "United States Election 2002 Web Archive."  Last updated August 5, 2011.  http://lcweb2.loc.gov/diglib/lcwa/html/elec2002/elec2002-overview.html
    The Election 2002 Web Archive includes Web sites associated with United States 2002 mid-term Congressional elections, gubernatorial elections, and mayoral elections in 15 major United States cities (including Washington, DC).
  • National Diet Library (Japan). "Survey on Comprehensive Collection, Storage, and Archiving of Japanese Web Sites." 2006. http://www.ndl.go.jp/en/aboutus/bulkresearch2005summary_e.html
    "From October 2004 to March 2005, a survey of web data in Japan was conducted for the purpose of studying the feasibility of and methodology for collecting, storing and archiving Japanese web sites. According to the survey, the total amount of web data in Japan as of March 2005 was estimated at 18.4 TB, and the total number of files at 450 million. These results are presented below, along with the results of studies on web archiving requirements."
  • Our Digital Island: A Tasmanian Web Archive. State Library of Tasmania. http://odi.statelibrary.tas.gov.au/
    "Our Digital Island provides access to Tasmanian Web sites that have been preserved for posterity by the LINC Tasmania."
  • September 11 Archive. Internet Archive.  http://archive.org/details/911 
    The 9/11 Television News Archive is a library of news coverage of the events of 9/11/2001 and their aftermath as presented by U.S. and international broadcasters.  A resource for scholars, journalists, and the public, it presents one week of news broadcasts for study, research and analysis.
  • UK Government Web Archive. The National Archives (UK).  http://www.nationalarchives.gov.uk/webarchive/ 
    The National Archives is preserving government information published on the Web by archiving UK Central Government Websites.
  • UK Web Archive.  http://www.webarchive.org.uk/ukwa/
    Here you can see how sites have changed over time, locate information no longer available on the live Web and observe the unfolding history of a spectrum of UK activities represented online.  Sites that no longer exist elsewhere are found here and those yet to be archived can be saved for the future by nominating them.  The Archive contains sites that reflect the rich diversity of lives and interests throughout the UK. Search is by Title of Website, Full Text or URL, or browse by Subject, Special Collection or Alphabetical List.
  • WebArchiv: Archive of the Czech Republic. http://en.webarchiv.cz/
  • WebBase Project. Stanford University. http://dbpubs.stanford.edu:8091/~testbed/doc2/WebBase/
    "The Stanford WebBase project has been collecting topic focused snapshots of Web sites. All the resulting archives are available to the public via fast download streams. For example, we collected pages from 350 sites every day for several weeks after the Katrina hurricane disaster. We also collect pages from government Web sites on a regular basis. In addition, the project examines how our archives can be explored by historians, sociologists, and public policy professionals. "

Review Examples of Web Archiving Projects and Initiatives

  • ARCOMEM (Collect-All ARchives to COmmunity MEMories). http://www.arcomem.eu/
    Intended outcomes include: "innovative models and tools for Social Web driven content appraisal and selection, and intelligent content acquisition; novel methods for Social Web analysis, Web crawling and mining, event and topic detection and consolidation, and multimedia content mining; reusable components for archive enrichment and contextualization; two complementary example applications, the first for media-related Web archives and the second for political archives; and a standards-oriented ARCOMEM demonstration system."
  • BlogForever. http://blogforever.eu/
    "BlogForever will create a software platform capable of aggregating, preserving, managing and disseminating blogs.Any user or organization will be able to use the BlogForever software & guidelines to create a digital repository containing their own selection of blogs."
  • LiWA: Living Web Archives. http://liwa-project.eu/
    Focusing on "long term interpretability as archives evolve," "improved archive fidelity by filtering out irrelevant noise," and "considering a wide variety of content"
  • Memento. http://www.mementoweb.org/
    "Memento proposes a technical framework aimed at better integrating the current and the past Web. The framework adds a time dimension to the HTTP protocol and, inspired by content negotiation, introduces the notion of datetime negotiation. The proposed framework can lead to more Web browsing fun as old versions of Web resources (e.g. in Web Archives and in Content Management Systems) become easier to access. But Memento also suggest a generic approach for versioning Web resources that can help bootstrap a variety of novel, temporal Web applications."
  • Netarchive.dk. http://netarkivet.dk/
    "Since 2005 the collection and preservation of the Danish part of the internet is included in the Danish Legal Deposit Law. The task is undertaken by the two legal deposit libraries in Denmark, State and University Library and The Royal Library. Netarchive.dk cannot be accessed by the general public.The archive is only accessible to researchers who have requested and been granted special permission to use the collection for specific research purposes. This website, Netarkivet.dk, is designed to inform researchers, website owners, and other interested parties about the Danish web archive. For the time being most of the website is in Danish."
  • PANDORA (Preserving and Accessing Networked Documentary Resources of Australia). National Library of Australia. http://pandora.nla.gov.au/
    "PANDORA, Australia's Web Archive, is a growing collection of Australian online publications, established initially by the National Library of Australia in 1996, and now built in collaboration with nine other Australian libraries and cultural collecting organisations."

Familiarize Yourself with Related Tools and Services

See also specifically:

Watch

  • "Web Archiving."  November 30, 2009.  Library of Congress, 3:11.  http://www.youtube.com/watch?v=T0943YkhLWU

    "Web content changes all the time.  If we don't save that content before it disappears, a major part of our cultural history will be lost.  The Library of Congress is working to provide permanent access to web content of historical importance.  It selects websites for collection, requests permissions from the website owners, addresses the technology of collecting web sites and preserves the web sites and makes them available.  This video examines those four challenges."
  • "Web Archiving and the IIPC."  2011.  International Internet Preservation Consortium, 5:23.  http://vimeo.com/26276709

    World scholars discuss the necessity of archiving the Web for future access.  This video is also available in German, Spanish, French, Japanese and Arabic.

Read

  • Ball, Alex. "Web Archiving." Edinburgh, UK: Digital Curation Centre, 2010. http://lac-repo-live7.is.ed.ac.uk/bitstream/1842/3327/1/Ball%20sarwa-v1....
    "Web archiving is important not only for future research but also for organisations’ records management processes. There are technical, organisational, legal and social issues that Web archivists need to address, some general and some specific to types of content or archiving operations of a given scope. Many of these issues are being addressed in current research and development projects, as are questions concerning how archived Web material may integrate with the live Web."
  • Bergman, Michael K.  "The Deep Web: Surfacing Hidden Value."  Journal of Electronic Publishing 7 no.1 (2001).   doi: http://dx.doi.org.libproxy.lib.unc.edu/10.3998/3336451.0007.104.  http://quod.lib.umich.edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0007.104 (subscription required to access this resource)
    A study at the NEC Research Institute, published in Nature, estimated that the search engines with the largest number of web pages indexed (such as Google or Northern Light) each index no more than sixteen per cent of the surface Web.  Since they are missing the deep Web when they use such search engines, Internet searchers are therefore searching only 0.03% — or one in 3,000 — of the pages available to them today.  Clearly, simultaneous searching of multiple surface and deep web sources is necessary when comprehensive information retrieval is needed.
  • Berners-Lee, Tim and Dan Connolly.  Hypertext Markup Language – 2.0.  Networking Working Group, 1995. http://www.ietf.org/rfc/rfc1866.txt
    This document specifies an Internet standards track protocol for the Internet community and requests discussion and suggestions for improvements.
  • Bragg, Molly, Kristine Hanna, Lori Donovan, Graham Hukill, and Anna Peterson. "The Web Archiving Life Cycle Model." Internet Archive, March 2013. http://archive-it.org/static/files/archiveit_life_cycle_model.pdf
    "The model is an attempt to distill the different steps and phases an institution experiences as they develop and manage their web archiving program."
  • Brown, Adrian.  Archiving Websites: A Practical Guide for Information Management Professionals.  Facet Publishing, 2006.
    This book is targeted at policy-makers, information management professionals, and web site owners and webmasters.  It provides an overview of best practice that can be applied to anything from archiving a national domain to an organizational web site.  The chapters include: the development of web archiving, selection, collection methods, quality assurance and cataloguing, preservation, delivery to users, legal issues, managing a web archiving programme, and future trends.
  • Brügger, Niels. "Step-by-step guide to archiving a website." In Archiving Websites: General Considerations and Strategies. Århus, Denmark: The Centre for Internet Research, 2005. http://cfi.au.dk/fileadmin/www.cfi.au.dk/publikationer/archiving_underside/guide.pdf
    "Since an archived website to a certain degree is only shaped in the archiving, it should be accompanied by a document containing methodical considerations of why and how the website has been archived. The following step-by-step guide is meant as an aid to the outline of such a document. In addition, it will naturally also act as a practical aid in connection with the actual archiving (and, of course, the following is to be seen in the context of the previous pages’ general deliberations and strategies, which it condenses in an itemised, tabular form.
    The guide is divided into two main parts: 1) prior to archiving, 2) the archiving process."
  • Cho, Junghoo and Hector Garcia-Molina.  "The Evolution of the Web and Implications for an Incremental Crawler."  In Proceedings of the 26th International Conference on Very Large Data Bases, 200-209. San Francisco, CA: Morgan Kaufmann, 2010. http://www.vldb.org/conf/2000/P200.pdf
    This paper studies how to build an eff ective incremental crawler.  The crawler selectively and incrementally updates its index and/or local collection of web pages, instead of periodically refreshing the collection in batch mode.  The incremental crawler can improve the freshness of the collection significantly and bring in new pages in a more timely manner.  It fi rst presents results from an experiment conducted on more than half million web pages over 4 months to estimate how web pages evolve over time.  Based on these experimental results, it compares various design choices for an incremental crawler and discusses their trade-off s.  It proposes an architecture for the incremental crawler, which combines the best design choices.
  • Day, Michael. "Collecting and Preserving the World Wide Web: A Feasibility Study Undertaken for the JISC and Wellcome Trust." Joint Information Systems Committee (JISC) and Wellcome Trust, 2003. http://www.jisc.ac.uk/uploaded_documents/archiving_feasibility.pdf
    This document reports on an "evaluation and feasibility study of Web archiving" supported by the Joint Information Systems Committee (JISC) and the Library of the Wellcome Trust. "The aims of this study are to provide the JISC and Wellcome Trust with:
    • An analysis of existing Web archiving arrangements to determine to what extent they address the needs of the UK research and FE/HE communities. In particular this is focused on an evaluation of sites available through the Internet Archive's Wayback Machine, to see whether these would meet the needs of their current and future users.
    • To provide recommendations on how the Wellcome Library and the JISC could begin to develop Web archiving initiatives to meet the needs of their constituent communities."
  • Farrell, Susan ed. "A Guide to Web Preservation." 2010. http://jiscpowr.jiscinvolve.org/wp/guide/
    This guide is based on the earlier (2008) "PoWR: The Preservation of Web Resources Handbook."
  • Fitch, Kent.  "Web site archiving: an approach to recording every materially different response produced by a website."  Paper presented at the AusWeb 2003: The Ninth Australian World Wide Web Conference, Sanctuary Cove, Australia.  http://ausweb.scu.edu.au/aw03/papers/fitch/paper.html
    This paper discusses an approach to capturing and archiving all materially distinct responses produced by a web site, regardless of their content type and how they are produced.  This approach does not remove the need for traditional records management practices but rather augments them by archiving the end results of changes to content and content generation systems.  It also discusses the applicability of this approach to the capturing of web sites by harvesters.
  • Gillies, James and Robert Cailliau.  How the Web was born: The story of the World Wide Web.  Oxford: Oxford University Press, 2000.
    Chapters include The Foundation, Setting the Scene at CERN, Bits and PCs, Enquire Within Upon Everything, What Are We Going To Call This Thing?, Sharing What We Know, The Beginning of the Future, and It's Official.
  • Kenney, Anne R., Nancy, McGovern, Peter Botticelli, Richard Entlich, Carl Lagoze, and Sandra Payette. “Preservation Risk Management for Web Resources: Virtual remote Control in Cornell’s Project Prism. D-Lib Magazine 8, no. 1 (2002). http://www.dlib.org/dlib/january02/kenney/01kenney.html
    "Project Prism's approach begins with characterizing the nature of preservation risks in the Web environment, develops a risk management methodology for establishing a preservation monitoring and evaluation program, and leads to the creation of management tools and policies for virtual remote control. The approach will demonstrate how Web crawlers and other automated tools and utilities can be used to identify and quantify risks; to implement appropriate and effective measures to prevent, mitigate, recover from damage to and loss of Web-based assets; and to support post-event remediation."
  • Lyman, Peter.  "Archiving the World Wide Web."  In Building a National Strategy for Preservation: Issues in Digital Media Archiving.  Council on Library and Information Resources, April 2002.  http://www.clir.org/pubs/reports/pub106/web.html
    This section of the Building a National Strategy for Preservation report analyzes the cultural, technical, economic, and legal issues surrounding Web archiving.
  • McGovern, Nancy, Anne R. Kenney, Richard Entlich, William R. Kehoe, and Ellie Buckley. “Virtual Remote Control: Building a Preservation Risk Management Toolbox for Web Resources. D-Lib Magazine 10, no. 4 (2004). http://www.dlib.org/dlib/april04/mcgovern/04mcgovern.html
    "Unlike most web preservation projects, Cornell University Library's Virtual Remote Control (VRC) initiative is based on monitoring websites over time—identifying and responding to detected risk as necessary, with capture as a last resort." "VRC leverages risk management as well as the fundamental precepts of records management to define a series of stages through which an organization would progress in selecting, monitoring, and curating target web resources. The first part of this article presents the stages of the VRC approach, identifying both human and automated responses at each stage. The second part describes the development of a toolbox to enable the VRC approach. The conclusion sets out our intentions for the future of VRC."
  • Masanès, Julien, ed. Web Archiving. New York, NY: Springer, 2006.
    "Julien Masanès, Director of the European Archive, has assembled contributions from computer scientists and librarians that altogether encompass the complete range of tools, tasks and processes needed to successfully preserve the cultural heritage of the Web. His book serves as a standard introduction for everyone involved in keeping alive the immense amount of online information, and it covers issues related to building, using and preserving Web archives both from the computer scientist and librarian viewpoints."
  • Masanès, Julien.  "Towards Continuous Web Archiving: First Results and an Agenda for the Future."  D-Lib Magazine 8 no. 12 (2002).  doi: 10.1045/december2002-masanes.  http://www.dlib.org/dlib/december02/masanes/12masanes.html

    This article outlines the contribution of the national library of France (BnF) to the Web archiving discussion.  BnF began a research project on Web archiving in late 1999.  Their work on Web archiving is divided into two parts.  The first part is to improve crawlers for continuous and adapted archiving.  This means being able to automatically focus the crawler for satisfactory archiving. Apart from getting existing, hands-on tools, this part of the project, which is presented in this article, consists of defining and testing good parameters toward that aim.  The second part of their work is testing every step of the process for depositing web content.

  • NDSA Content Working Group. "National Digital Stewardship Alliance Web Archiving Survey Report." June 19, 2012. http://www.digitalpreservation.gov/ndsa/working_groups/documents/ndsa_we...
    "From October 3 through October 31, 2011, the Content Working Group conducted a survey of organizations in the United States that are actively involved in, or planning to start, programs to archive content from the web. The goal of the survey was to better understand the landscape of web archiving activities in the United States, including identifying the organizations or individuals involved, the types of web content being preserved, the tools and services being used, and the types of access being provided. This summary report examines participant responses for the purposes of discerning trends, themes, and emerging practices and challenges in web-based content acquisition and preservation."
  • O'Neill, Edward T., Brian F. Lavoie, and Rick Bennett.  "Trends in the Evolution of the Public Web."  D-Lib Magazine 9 no. 4 (2003).  doi: 10.1045/april2003-lavoie.  http://www.dlib.org/dlib/april03/lavoie/04lavoie.html
    This article examines three key trends in the development of the public Web — size and growth, internationalization, and metadata usage — based on data from the OCLC Office of Research Web Characterization Project, an initiative that explores fundamental questions about the Web and its content through a series of Web samples conducted annually since 1998.
  • PADI: Preserving Access to Digital Information. Web Archiving. http://www.nla.gov.au/padi/topics/92.html [Not updated or maintained since 2010, but provides an extensive annotative list of resources up to that date.]
  • "PoWR: The Preservation of Web Resources Handbook." ULCC, UKOLN and JISC, 2008. http://jiscpowr.jiscinvolve.org/wp/files/2008/11/powrhandbookv1.pdf
    This Handbook is one of the outputs from the JISC-funded PoWR (Preservation Of Web Resources) project.

Last updated on 12/31/69, 7:00 pm by Anonymous

Archiving Web Sites - Legal and Ethical Considerations

Q. What legal and ethical issues should I consider in web archiving?

Archiving websites is not as simple as collecting whatever web pages you want and saving them. There are a number of legal and ethical issues to consider as well. Archiving websites can involve complications with properties rights, privacy, content liability and human rights, and regulatory compliance and public accountability. 

 

Take action

  • Consider intellectual properties rights, privacy, content liability and human rights, and regulatory compliance and public accountability and their effect on your web archive
  • Write policies that reflect your institution's stance on these legal and ethical considerations

 

Review use cases

 

Read

 

  • Charlesworth, Andrew.  "Legal issues relating to the archiving of Internet resources in the UK, EU, USA and Australia: A study undertaken for the JISC and Wellcome Trust."  Version 1.0.  February 25, 2003.  http://www.jisc.ac.uk/uploaded_documents/archiving_legal.pdf 
    This paper examines the key legal issues of Web archiving in relation to the United Kingdom and how potential risks to a UK based web archive might be minimised.  It also surveys the approaches to web archiving taken in some other jurisdictions, including several EU countries, the U.S. and Australia.
  • Copyright Management Center. Indiana and Purdue Universities. [http://www.iupui.edu/~webtrain/web_samples/cmc.html]  The Copyright Management Center no longer exists. 
    Do you want this page of links from Indiana University instead? (CB)  http://copyright.iu.edu/resources
  • Field, Tom.  "IP Basics: Copyright on the Internet."  http://law.unh.edu/thomasfield/ipbasics/copyright-on-the-internet.php 
    This discussion addresses U.S. copyright issues of concern to those who post to or own email lists or host web pages.  It also deals with situations where someone might want to forward or archive another's email posting or to copy material from another's web page.
  • Glanville, Lachlan. "Web archiving: ethical and legal issues affecting programmes in Australia and the Netherlands" Australian Library Journal (2010).
    "This paper will examine the barriers faced by web archiving programmes in national libraries, such as the Koninklijke Bibliotheek in the Netherlands and the National Library of Australia's PANDORA. The report will analyse how these programmes deal with the difficulties and limitations inherent in such programmes by examining how they approach issues of selection, access and copyright, while drawing comparisons between the programmes of the two institutions and the legal frameworks in which they function. "
  • Harper, Georgia K.  "Copyright Crash Course."  University of Texas, 2007.  http://copyright.lib.utexas.edu/ 
    Includes sections on Own Manage Share, Building On Others' Creative Expression, Copyright in the Library, and University Administrative Interests.  Also includes an Online Tutorial  with 12 questions related to the information in the Web site.
  • Kavčič-Čolić, Alenka.  "Archiving the Web - some legal aspects."  68th IFLA Council and General Conference, Glasglow.  August 18-24, 2002.  http://archive.ifla.org/IV/ifla68/papers/116-163e.pdf 
    The paper presents some legal aspects of archiving the web pages, concerning the harvesting, providing public access to them, and long-term preservation.
  • Minow, Mary.  "Copyright: What You Need to Know about Your Library's Web Page."  California Library Association.  Reprinted from California Libraries 10 no. 3 (2002).  http://www.cla-net.org/resources/articles/minow_copyright.php
    trouble getting this to open in order to annotate (CB)
  • Library of Congress. "United States Copyright Office." http://www.copyright.gov/
    Provides links to copyright basics, FAQs, law and policy, etc.  Also includes a circular entitled "Copyright Registration for Online Works" at http://www.copyright.gov/circs/circ66.pdf.
  • Library of Congress.  Copyright Statement for the September 11 Web Archive.  Last updated August 5, 2011.  http://lcweb2.loc.gov/diglib/lcwa/html/sept11/sept11-overview.html#copyright 
    Includes an explanation of copyright restrictions for the Web sites included in the collection.
  • National Library of Australia.  "Preserving Access to Digital Information (PADI): Intellectual property rights management."  Last updated August 2000.  http://pandora.nla.gov.au/pan/10691/20110824-1153/www.nla.gov.au/padi/topics/28.html 
    Provides background on copyright and preservation strategies, legislation, and access.  Includes annotated lists of Web sites on the following categories: Articles, Events, Organisations and Websites, Policies, Strategies & Guidelines, Project and Case Studies, Journals & Newsletters, Discussion Lists, and Surveys.
  • Rauber, Andreas, Max Kaiser, and Bernhard Wachter. "Ethical Issues in Web Archive Creation and Usage - Towards a Research Agenda." Paper presented at the International Web Archiving Workshop, Aaarhus, Denmark, September 18-19 2008.
  • Stanford University Libraries.  "Websites: Five ways to stay out of trouble."  2010.  http://fairuse.stanford.edu/Copyright_and_Fair_Use_Overview/chapter6/6-a.html
    Because the Web is freely accessible and because of the ease of copying material from one site to another, many myths have developed regarding the right to use copyrighted materials and trademarks on the Web. Without repeating the copyright and trademark rules established in other chapters, this section provides five simple rules for your website.
  • Thelwall, Mike, and David Stuart. "Web Crawling Ethics Revisited: Cost, Privacy and Denial of Service." Journal of the American Society for Information Science and Technology 57, no. 13 (2006): 1771-79.
    "Ethical aspects of the employment of web crawlers for information science research and other contexts are reviewed. The difference between legal and ethical uses of communications technologies is emphasized as well as the changing boundary between ethical and unethical conduct. A review of the potential impacts on web site owners is used to underpin a new framework for ethical crawling and it is argued that delicate human judgments are required for each individual case, with verdicts likely to change over time. Decisions can be based upon an approximate cost-benefit analysis, but it is crucial that crawler owners find out about the technological issues affecting the owners of the sites being crawled in order to produce an informed assessment."

Last updated on 12/31/69, 7:00 pm by Anonymous

Archiving Web Sites - Selection

Q. What do I need to consider for developing a selection policy?

After you have identified what web site content you have and what content you may be acquiring, you will have to decide what content you will continue to preserve and what content you should acquire.  With digital storage costs dropping it may seem like you can grab and preserve anything, but remember that while storage is cheap, managed storage is expensive.  For every web site you wish to preserve, you must also collect and store the metadata necessary to understand and access it in the future. 

 

Take action

  • Set selection policy goals to meet the institutional mission
  • Build the basis for planning, analysis, and coordination with other archives if needed
  • Set priorities
  • Consider elements that policy should address: content and scope; target audience; anticipated use; depth of coverage; exclusions; review and revision
  • Develop policies

Review use cases

  • National Archives and Library of Canada, "Digital Collection Development Policy."  Last updated January 18, 2007.  http://www.collectionscanada.gc.ca/collection/003-200-e.html

    This policy indicates the directions Library and Archives Canada takes to ensure the collection of digital documentary heritage materials of enduring interest to the history and culture of Canada, and in collaboration with others, to enable the collection of other digital information resources of value to Canadians.

  • National Library of Australia, "Selection Guidelines."  Last updated April 27, 2011. http://pandora.nla.gov.au/guidelines.html

    Provides specific selection guidelines for each of PANDORA's participating agencies.

  • National Library of Australia, "Policy and Practice Statement."  Last updated October 5, 2011. http://pandora.nla.gov.au/policy_practice.html

    PANDORA, Australia’s Web Archive, is a collection of Australian online publications and web sites which is being built by the National Library of Australia and ten other participants. This initiative was commenced in 1996 by the National Library in recognition of the fact that an increasing volume of Australia’s documentary heritage was being published in online formats only.  Given the mandate under the National Library Act, 1960 to build a comprehensive collection of Australian published materials, collecting online resources was seen as a necessary extension of the Library’s collecting responsibilities.

Read

  • International Federation of Library Associations and Institutions, Section on Acquisition and Collection Development.  "Guidelines for a Collection Development Policy using the Conspectus Model."  March 2001.  http://www.ifla.org/files/acquisition-collection-development/publications/gcdp-en.pdf (subscription required to access this resource)
    This booklet is a brief guide on how to write a collection development policy, making use of the Conspectus methodology.  It is the result of the recognition by the IFLA Acquisition and Collection Development Section that its worldwide members lacked a handy introduction to this important subject.  The guide is intended to be of particular value to staff new to collection development and in areas where there is little written tradition of collection development.  We hope that it will be of practical use to librarians setting out on the sometimes daunting task of writing a collection development policy.
  • Library of Congress.  "Collection Policy Statements Supplementary Guideline."  November 2008.  http://www.loc.gov/acq/devpol/webarchive.pdf 
    Focuses on Web Archiving, with sections on Scope, Research Strengths, Collecting Policy, Acquisition Source: Current and Future, and Collecting Levels.
  • Viégas, Fernanda B. “Bloggers’ Expectations of Privacy and Accountability: An Initial Survey.” Journal of Computer-Mediated Communication 10, no. 3 (2005), http://jcmc.indiana.edu/vol10/issue3/viegas.html.
    "This article presents an initial snapshot, based on an online survey of weblog authors, of bloggers' subjective sense of privacy, and of their perceptions of liability. The findings suggest that the social norms of bloggers are emergent and self-imposed. When confronted with questions of defamation and legal liability, respondents in the survey expressed contradictions between their actions and their knowledge of how the technology works. They generally believed that they were liable for what they published online, although they were not concerned about the persistence of their entries. In general, bloggers do not feel as if they know their audiences. For the most part, blog authors have no control over who accesses their entries, and this inability to define their audiences leads them to make a number of assumptions about who their readers are. "
  • Last updated on 12/31/69, 7:00 pm by Anonymous

 

Digital curation guide activity types

The Digital Curation Exchange hosts guides for the following activities:

Last updated on 12/31/69, 7:00 pm by Anonymous

Digitizing - Selection Criteria

Q. How should I select materials for digitization?

Selecting which materials you would like to digitize involves examining a number of different criteria. You will need to develop criteria that suit your individual purposes, but as you do, consider whether or not digitizing particular content fits in with your institution's mission statement and whether or not you can preserve and provide access to the content.

 

Take action

  • Review other institution's selection criteria for digitization
  • Develop and document your own selection criteria for digitization

 

Watch

Andrea Jackson, Archivist at the Robert W. Woodruff Library of the Atlanta University Center explains how the RWWL selected documents for the HBCU digital collection.
Interview with the coordinator of the French digital library Gallica.  Discusses the work and goals of Gallica, copyright restrictions, criteria for selection (around 5:54), the need for standards, and the future of libraries.

 

Review use cases

From the webpage: "This policy provides a collection development framework for the Library's digitization projects. It is intended to offer guidance in understanding how our digitization work reflects and supports collection development at Dartmouth in broad terms and it sets out criteria to help determine if potential use warrants the human and fiscal resources necessary to undertake the digital project. The policy's aim is to create a consistent, structured approach to reformatting our collections. Whether digitization is done at the object level or at the collection level, the framework remains the same."
Begins by answering the question, "Why do you need to assess material?" and then references some previous studies.  Defines various classes of material and suggests evaluation based on need and feasibility.
Includes sections on Determining Your Selection Criteria, Documenting Your Selection Criteria, a Conclusion, and suggestions for Further Reading.
The Southern Historical Collection staff "use archival theory and practice" to frame their digitization activities. "In digitizing the collection, the SHC staff employs the archival principle of provenance: organizing and maintaining the individual collections based on the origins of the materials, rather than piecing together new collections of selected documents based on topics, geography, or chronology, or other characteristics." This site discusses how the SHC determines the order in which items are digitized and what the keys issues are for each collection. The site also provides a decision matrix.
Offers criteria in the categories of Collection Development, Preservation/Archiving, Organizational/funding, and Access.

 

Read

Considers selection criteria, content value, intellectual property rights, and technical aspects.
Covers such issues as copyright, potential use, format, and cost and benefits. This volume includes a thoughtful decision-making matrix.
Discusses selection criteria of value, condition, characteristics of originals, acceptability of the resulting digital object, and access aids.
  • National Information Standards Organization. Framework Working Group.  "A Framework of Guidance for Building Good Digital Collections."  3rd ed. Baltimore, MD: NISO, December 2007.  http://www.niso.org/publications/rp/framework3.pdf
Provides an overview, identifies existing resources, and encourages community participation in developing best practices. Aimed at both cultural heritage organizations and funding organizations.
From the Introduction: "This article presents the results of a close reading of current practices and guidelines for digitisation, in an attempt to further the movement towards greater consensus on this issue. From the existing myriad approaches found in the field, the article formulates a set of common criteria for selection by way of a sector-independent longlist. In this way the article illustrates the complex nature of selection, which may be seen to depend upon significantly greater number of criteria than have so far been put forward in any single guiding document, but it also proposes a base-terminology that can be used in any institutional setting. Thus, it puts forward a possible common ground for selection practices and argues that the adoption of a more uniform language, and a more open and communicative approach, may not only help structure the decision-making process but is also a vital part of good governance."
Acknowledges the changes in scale and quality that have resulted from mass digitization projects, considers preservation strategies, and calls for greater collaboration.
Considers the issues you need to examine when selecting material -- whether selecting physical originals for digitization, or reviewing born digital materials for preservation or republication -- as well as how to ensure that this process takes into account the aims and characteristics of your organization, the profile and needs of your users, and the characteristics of your collections.
Synthesizes the experiences of libraries in digitizing collections, offers several case studies, and questions which sorts of collections should be prioritized for digitization.
The questions and choices reflected here will assist the ultimate decision to accept or reject long-term preservation responsibility.

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

Digitizing - Planning

Q.

HCIST 2011 - International Workshop on Health and Social Care Information Systems and Technologies

Focus: 
Focus: 
Date: 
Tuesday, October 4, 2011 - 18:00 - Thursday, October 6, 2011 - 18:00

International Workshop on Health and Social Care Information Systems and Technologies, a CENTERIS 2011 workshop, to be held in Vilamoura, Algarve, Portugal, from 5 to 7 October, 2011.

Who are the stakeholders for my IR?

 Take action

  • Talk to all of the stakeholders, from the very top, the middle-managers, and the people "in the trenches" and find out what they think about the project and what are their priorities.
  • Use some readings to establish a common language and framework with different stakeholders.
  • Establish clear roles and responsibilities and accountability measures.

Watch

  • Neal, James. "Institutional Repositories." YouTube, 3:40.  Posted by CDRS at Columbia University.  August 12, 2010. [Link]

"Columbia University Librarian James Neal discusses the multifaceted role online repositories, such as Columbia's Academic Commons, are playing in the scholarly communication system."

Read

  • Barton, Mary R. and Margaret M. Waters.  "Planning Your Institutional Repository Service." Chapter 2 in Creating an Institutional Repository: LEADIRS Workbook.  MIT Libraries, 2004-2005. [Link]  

    The Learning About Digital Institutional Repositories Seminars programme (LEADIRS) aims to describe and illustrate how to build an online institutional repository.  This workbook book supplements the seminar presentations and offers practical advice as well as work sheets you can use to get started with your own repository programme.  Where possible, it points you to real-world examples of planning aids or presentations used by university library teams in the UK and around the world.

  • Bailey, Jr., Charles W.  "Institutional Repository Bibliography." [Link]  

    The Institutional Repository Bibliography primarily includes published articles, books, and technical reports. Coverage of conference papers and unpublished e-prints is very selective.  All included works are in English.  The bibliography does not cover digital media works (such as MP3 files), editorials, e-mail messages, interviews, letters to the editor, news articles, presentation slides or transcripts, or web log postings.  Most sources have been published from 2000 through the present; however, a limited number of key sources published prior to 2000 are also included.

  • Digital Preservation Coalition. "Interactive Assessment: Selection of Digital Materials for Long-term Retention." [Link]
    This decision tree considers the issues of Selection, Rights and Responsibilities, Technical/Costs, Documentation and Metadata/Costs.
  • Kendrick, Tom.  Results Without Authority: Controlling a Project When the Team Doesn't Report to You - A Project Manager's Guide.  New York: AMACOM, 2006. 
    This book delivers proven techniques for controlling projects and managing diverse teams in a wide variety of situations, and bringing those projects to successful closure.


Last updated on 12/31/69, 7:00 pm by Anonymous

Digital Curation in Research Libraries & the Learning Commons, Post #3

Post #3

May 25, 2011

Chicago State University Archivist

Focus: 

 

How to Use the Guides

Each Getting Started Guide is designed to provide cultural heritage information professionals with fundamental information to begin working with digital materials in new ways. These are resource guides, organized around key questions, that provide the reader with up-to-date webliographies and interviews with experts.  The guides are not designed to make a novice into an expert but they are intended to start individuals on that path and to share the experience and wisdom of those who have blazed the digital curation trails. Happy Traveling!

 

 

For museum professionals, we provide the following pathfinder to help pinpoint resources within CDCG that are particularly relevant for the museum community.

 

Last updated on 12/31/69, 7:00 pm by Anonymous

 

Syndicate content


about seo