Skip to Content

Archiving Web Sites - Metadata

Q. What kind of metadata do I need for my web archive?

In order to boost accessibility and to aid in preservation, you will need to collect and store metadata with the web content you collect.  Look into collecting and storing administrative, structural, and descriptive metadata.


Take action

  • Determine the kinds of metadata you will collect
  • Choose the metadata model you will use
  • Decide how to capture and/or create the metadata


Review use cases

  • National Library of Australia.  "Electronic Resources Cataloguing Manual."  2nd ed.  Last updated April 24, 2006.
    Provides a glossary and links to information about cataloguing electronic resources.


Review metadata models and standards

  • "Dublin Core Metadata Initiative."  Last updated June 14, 2012.

    Includes sections on Metadata Basics and DCMI Specifications.

  • Library of Congress.  "Encoded Archival Description" (EAD).  Version 2.0.  Last updated November 1, 2011.
    Includes general information about EAD along with schema and a tag library for Version 2.0 as well as Version 1.0.



  • Dollar Consulting.  "Metadata Preservation Model."  Appendix 2 in Archival Preservation of Smithsonian Web Resources: Strategies, Principles, and Best Practices.  July 20, 2001.
    The metadata requirements for tracking and preserving Web sites and HTML pages that are identified in this Appendix draw upon recordkeeping principles and requirements, best archival practices, and the Dublin Core. Specifically, these requirements incorporate the guidelines, recommendations, and best practices identified in the Public Record Office Victoria (Australia) VERS Metadata Scheme, (PROS 99/007 Specification 2), the University of British Columbia study "Protecting the Integrity of Electronic Records," the University of Pittsburgh "Metadata Specifications Derived from Functional Requirements: A Reference Model for Business Acceptable Communications, DOD 5015.2 "Design Specifications for Electronic Records Management Software Applications, and 36 CFR Part 123 (National Archives).  These requirements are organized into three areas: (1) A General Description of the Format; (2) Web site and HTML page identification data; and (3) Preservation data for each Web site and HTML page.
  • International Internet Preservation Consortium (IIPC).
    The mission of the International Internet Preservation Consortium (IIPC) is to acquire, preserve and make accessible knowledge and information from the Internet for future generations everywhere, promoting global exchange and international relations.  This Web site includes access to numerous reports issued by the IIPC.


Last updated on 08/26/13, 10:03 pm by callee



about seo | group_wiki_page