Skip to Content

What software applications are available to help manage data?

Take Action

  • Explore data management software tools
  • Review case studies that illustrate data management software applications in use



  • Colectica. “Colectica®: DDI Metadata and Survey Design Software Tools.” Accessed October 3, 2013.

Colectica® is the fastest way to design, document, and publish your statistical data and survey research using open data standards.

Data Conservancy is devoted to developing institutional solutions for the challenges of data collection, preservation and re-use.

This webpage provides links to several software platforms that support deposit, preservation, and access to digital content.

Kepler is designed to help scien­tists, analysts, and computer programmers create, execute, and share models and analyses across a broad range of scientific and engineering disciplines.  Kepler can operate on data stored in a variety of formats, locally and over the internet, and is an effective environment for integrating disparate software components, such as merging "R" scripts with compiled "C" code, or facilitating remote, distributed execution of models.  Using Kepler's graphical user interface, users simply select and then connect pertinent analytical components and data sources to create a "scientific workflow"—an executable representation of the steps required to generate results.  The Kepler software helps users share and reuse data, workflows, and components developed by the scientific community to address common needs.

  • University of Southampton, University of Manchester, and University of Oxford. “myExperiment.” Accessed October 3, 2013.

The myExperiment Virtual Research Environment enables you and your colleagues to share digital items associated with your research — in particular it enables you to share and execute scientific workflows.  You can use to find publicly shared workflows.  If you want further access, and the ability to upload and share workflows, you will need to sign up.

  • University of Southern California Information Sciences Institute. “Pegasus: Workflow Management System.” Accessed October 3, 2013.

The Pegasus project encompasses a set of technologies that help workflow-based applications execute in a number of different environments including desktops, campus clusters, grids, and clouds.  Pegasus bridges the scientific domain and the execution environment by automatically mapping high-level workflow descriptions onto distributed resources.  It automatically locates the necessary input data and computational resources necessary for workflow execution.  Pegasus enables scientists to construct workflows in abstract terms without worrying about the details of the underlying execution environment or the particulars of the low-level specifications required by the middleware (Condor, Globus, or Amazon EC2).  Pegasus also bridges the current cyberinfrastructure by effectively coordinating multiple distributed resources.



  • Baxter, Susan M., Steven W. Day, Jacquelyn S. Fetrow, and Stephanie J. Reisinger. “Scientific Software Development Is Not an Oxymoron.” PLoS Computational Biology 2, no. 9 (2006): e87. doi:10.1371/journal.pcbi.0020087.

The field of computational biology crosses the span between engineering and science—a surprisingly (to some) large gulf that typically is uncovered in the process of developing scientific software. Why opine on best practices for scientific software projects now? Computational biologists are taking on increasingly important roles in this Internet-enabled, information-rich, high-throughput era of biology...Software applications are needed to aggregate, integrate, and manage data, tools, results, and discoveries.

  • Deelman, Ewa, Gurmeet Singh, Mei-Hui Su, James Blythe, Yolanda Gil, Carl Kesselman, Gaurang Mehta, et al. “Pegasus: A Framework for Mapping Complex Scientific Workflows onto Distributed Systems.” Scientific Programming Journal 13, no. 3 (July 2005): 219–237.

This paper describes the Pegasus framework that can be used to map complex scientific workflows onto distributed resources. Pegasus enables users to represent the workflows at an abstract level without needing to worry about the particulars of the target execution systems. The paper describes general issues in mapping applications and the functionality of Pegasus. We present the results of improving application performance through workflow restructuring which clusters multiple tasks in a workflow into single entities. A real-life astronomy application is used as the basis for the study.

  • Harris, Paul A., Robert Taylor, Robert Thielke, Jonathon Payne, Nathaniel Gonzalez, and Jose G. Conde. “Research Electronic Data Capture (REDCap): A Metadata-driven Methodology and Workflow Process for Providing Translational Research Informatics Support.” Journal of Biomedical Informatics 42, no. 2 (April 2009): 377–381. doi:10.1016/j.jbi.2008.08.010.

REDCap is a novel workflow methodology and software solution designed for rapid development and deployment of electronic data capture tools to support clinical and translational research. We present: 1) a brief description of the REDCap metadata-driven software toolset; 2) detail concerning the capture and use of study-related metadata from scientific research teams; 3) measures of impact for REDCap; 4) details concerning a consortium network of domestic and international institutions collaborating on the project; and 5) strengths and limitations of the REDCap system. REDCap is currently supporting 286 translational research projects in a growing collaborative network including 27 active partner institutions.

Workflow is becoming extremely important within coarse-grained distributed system models for e-Science applications. This book presents an overview of the state of the art within established projects, presenting many different aspects of workflow from network users to tool builders. It aims to provide a broad overview of active research.


about seo | group_wiki_page