Skip to Content

OLD: Managing Data - Description

Q. How do I describe data?

Having informative descriptions is deeply valuable in providing a key to accessing, understanding, and using the datasets you are managing.  The best practice is to educate data creators and to have them provide the necessary description of their data.  If the data you are managing does not have adequate description, and you are not in contact with its creators, you may have to describe the data in-house to the best of your and your staff's abilities.

 

Take action

Use tools that work best for your project from cases studies and below.  Confusing description -- these aren't case studies.  Plus other pages don't have this kind of direction below the section heading. (CB)

 

Review use cases

  • DeRoure, D. and J.A. Hendler.  "E-Science: the Grid and the Semantic Web."  IEEE Intelligent Systems 19 no. 1 (Jan/Feb 2004): 65-71.  http://ieeexplore.ieee.org/iel5/5254/28315/01265888.pdf 
    Over the past few years, researchers have been treated to two visions of the Internet's future. One is the Semantic Web, the next generation of World Wide Web technology.  The second is grid computing, the next generation of internetworked processing.  The Semantic Web is described as "an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation."  Grid computing is defined as "flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources."  It discusses their differences and, more importantly, their similarities, and explores the work needed to bring the two together.  It focuses particularly on scientists' needs, an area in which the high-power computing made possible by grid computing and the large-scale, distributed information management enabled by Semantic Web technologies will need to be integrated.  In particular, this enables new approaches to interdisciplinary scientific endeavors made possible by these new technologies.
  • ArcGIS Data Interoperability  http://www.esri.com/software/arcgis/extensions/datainteroperability
    ArcGIS Data Interoperability eliminates barriers to data sharing by providing state-of-the-art direct data access; data translation tools; and the ability to build complex spatial extraction, transformation, and loading (ETL) processes.  ArcGIS Data Interoperability allows you to use any standard GIS data, regardless of format, within the ArcGIS for Desktop environment for mapping, visualization, and analysis.  The Workbench application, included with the extension, enables you to build complex spatial ETL tools for data validation, migration, and distribution.  This seems to be a software program rather than a use case (CB).
  • National Institute for Standards and Technology.  Last updated October 2000.  http://physics.nist.gov/cuu/Units/  
    not clear how this relates to description (CB)
  • Data Conservancy  http://dataconservancy.org/ 
    The Data Conservancy is a growing community that is developing software using industry best practices, the open archival information system (OAIS) reference model, service-oriented architecture (SOA), meta-data standards, and much more.

 

Watch

  • Data Conservancy: A Web Science View of Data Curation.  YouTube, 1:03.  Posted by Library of Congress.  December 17, 2010.  http://www.youtube.com/watch?v=fDOSsKQBNyg 
    The Data Conservancy embraces a shared vision: data curation is not an end, but rather a means to collect, organize, validate and preserve data to address grand research challenges.  Sayeed Choudhury provides an overview of the Data Conservancy with an emphasis on the data framework aspects of the project.

 

Read

  • Hook, Les A., Suresh K. Santhana Vannan, Tammy W. Beaty, Robert B. Cook, and Bruce E. Wilson.  "Best Practices for Preparing Environmental Data Sets to Share and Archive."  September 2010.  http://daac.ornl.gov/PI/BestPractices-2010.pdf 
    Read pages 12-21: 2.2. Use Consistent Data Organization, Use Consistent File Structure, 2.3. Assign Descriptive File Names and Stable File Formats For Tabular and Image Data, and 2.4. Assign Descriptive File Names.
  • Abbott, Daisy.  "Interoperability."  Digital Curation Centre, February 4, 2009.  http://www.dcc.ac.uk/resources/briefing-papers/introduction-curation/interoperability 
    Sections include: short-term benefits and long-term value, examples of interoperability in practice, higher/further education perspective, e-Science perspective, issues to be considered, and additional resources.
  • Borer, Elizabeth T., Eric W. Seabloom, et al.  "Some Simple Guidelines for Effective Data Management." Bulletin of the Ecological Society of America (April 2009): 205-14.  http://www.esajournals.org/doi/pdf/10.1890/0012-9623-90.2.205 
    This document provides some simple guidelines for effective data management, which, if put into practice, will benefi t the original data owner as well as enhance prospects for the long-term preservation and re-use of the data by other researchers.  The rules of thumb presented here facilitate rapid and accurate interpretation of the data, and pre-dispose the data to more effective processing by computers.
  • Lord, Philip and Alison Macdonald.  "e-Science Curation Report: Data curation for e-Science in the UK: an audit to establish requirements for future curation and provision."  Joint Information Systems Committee, 2003.  http://www.jisc.ac.uk/uploaded_documents/e-ScienceReportFinal.pdf 

    This study examined the current provision and future needs of curation of primary research data in the UK, particularly within the e-Science context.  It summarises the strategic and policy analyses and outlines proposals for the organisational structuring of curation provision and provides a table showing which recommendations address the findings.

& from Organizing Data Section:

I don't understand this formatting (CB)

  • Van den Eynden, Veerle, Louise Corti, et al.  "Managing and sharing data."  Colchester: UK Data Archive, 2011.  http://www.data-archive.ac.uk/media/2894/managingsharing.pdf

    Section on Organising Files and Folders (pages 13-14) summarises best practice.

  • Barkstrom, Bruce and Mike Folk.  "Attributes of file formats for long-term preservation of scientific and engineering data in digital libraries."  Joint Council on Digital Libraries, 2003.  http://www.ncsa.uiuc.edu/NARA/Sci_Formats_and_Archiving.doc -- opens a Word document 
    This paper describes the need to consider how file formats affect the ease and effectiveness with which scientific and engineering data may be stored and accessed in long term archives.  They identify a number of attributes of file formats that can help or hinder them as candidates for long-term digital preservation and consider how these attributes appear to a number of different audiences for long-term archiving.
  • Digital Curation Centre.  "Standard Naming Conventions."  Last updated February 28, 2012.  http://www.dcc.ac.uk/resources/external/standard-naming-conventions-electronic-records 
    Written by the Records Management Section at the University of Edinburgh, this document provides a common set of rules to apply to the naming of electronic records.  The conventions are primarily intended for use with Windows-based software and documents such as word-processed documents, spreadsheets, presentations, e-mails and project plans.  "File names" are the names that are listed in the file directory and that users give to new files when they save them for the first time.  The conventions assume that a logical directory structure or filing scheme is in place and that similar conventions are used for naming the levels and folders within the directory structure.
  • Davidson, Joy.  "Placing our stuff so we can find it later: A meta-learning essential."  Digital Curation Centre. April 10, 2006.  http://www.dcc.ac.uk/news/placing-our-stuff-so-we-can-find-it-later-meta-learning-essential 
    Learning from our previous work is often inhibited by difficulties in finding relevant materials after a period of time and, when found, making sense of them.  Presented here are several practical approaches for alleviating this difficulty.  The suggestions are (1) craft meaningful, contextual file names; (2) place things where they can be easily found; and, (3) relentlessly discard useless items.  It is widely accepted that meta-learning is a matter of acquiring skills.  This paper suggests how three specific skills related to using computing technology can be improved, thus enabling enhanced learning for these populations.  Following this discussion, the paper will link back up with a meta-learning perspective on the value of improving these skills.

 

Last updated on 09/27/13, 3:07 pm by tlchristian

 

Groups:


about seo | group_wiki_page