Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Data Management Toolkit: Overview

Overview

The Research Data Management Consulting Services Team is here to assist you with:

  • Creating data management plans
  • Creating disciplinary metadata
  • Standardizing file formats
  • Archiving and storing data in repositories
  • Finding datasets
  • Citing datasets

Fill out this online consultation form for fast and confidential help.

 

 

Data Dose

Data Dose is a monthly LANL newsletter reminder about data-related events and information to help you manage and love your data, supporting the LANL Information, Science, and Technology Science Pillar. To subscribe to the mailing list, email listmanager@lanl.gov with “subscribe datadose_newsletter” in the body of your message.

Data Dose

Data Management Overview

The LANL Data Management Toolkit is designed to provide guidance, best practices, and resources on the steps within the research data lifecycle. The toolkit provides guidance to help LANL researchers develop a data management plan (DMP), explaining how data generated or collected on a federally funded project will be described, shared, and preserved for ongoing access. Additional information may be found on the LANL Data Management website (internal)

The toolkit is divided into the following six conceptual lifecycle sections to help address questions, navigable through the top tabs:

Funding & Agency Mandates

Most US grant funding agencies now request, and some require, data management plans to accompany proposals.You can use the DMPTool for guidance tailored to specific funder/program requirements. We recommend looking at the SPARC Data Sharing Requirements by Federal Agency to check current data sharing requirements from various US governmental agencies (including granting agencies such as NSF).

 

Data Repositories and Open Datasets

DMPs

DMP

SRO-RL Research Data Management Consultation

We are here to help you find, use, manage, visualize and share your data. Visit our website to learn more about our services. Schedule a consultation with a data expert. View and register for upcoming workshops.

Data Dose

Data Dose is a monthly LANL newsletter reminder about data-related events and information to help you manage and love your data, supporting the LANL Information, Science, and Technology Science Pillar. To subscribe to the mailing list, email listmanager@lanl.gov with “subscribe datadose_newsletter” in the body of your message.

Data Dose

DMP questions

In general, your data management plan should address the types of data, samples, physical collections, software, and other materials to be produced in the course of the project; the standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies); and policies for access and sharing including provisions for appropriate protection of privacy (if appropriate).

In the majority of cases, a DMP will be required as part of the research proposal for federally-funded research (NSF, NIH etc). The author uses the DMP to plan how data will be handled throughout its lifecycle, updating the document throughout the project.

Getting Started

Get Started with DMPTool:

  1. Go to https://dmptool.org
  2. Click “Sign in” in the top right-hand corner 
  3. Choose Option 1: If your institution is affiliated with DMPTool by clicking “Your institution” 
  4. Begin typing Los Alamos National Laboratory to bring up the option, select it from the list of partners that will appear, and hit the Go button
  5. Select the Create Account tab and fill out the requested information
  6. You should be successfully signed into DMPTool

dmp screen 3

DMP Best Practices

  • Create a DMP prior to initiating research.

  • Review the DOE's Office of Science Suggested Elements for a Data Management Plan (updated 1/1/2021). Read (and re-read) the DMP requirements of your funding agency. Evaluate publically available LANL DMP templates.

  • Write DMP content that is descriptive of the project's data acquisition, processing, analysis, preservation, publishing, and sharing (if public access)

  • Anticipate and identify software and storage needs by considering the types of data that will be created. Identify any proprietary or sensitive data in the DMP prior to data acquisition or collection to legally justify the need to withhold them from public access if necessary. Identify suitable repositories for the data. Create and document a data backup policy.

  • Define roles and responsibilities for management, distribution and ownership of data and subsequent metadata

  • Get Credit by including your LANL@ORCID identifier on your dataset to ensure you get credit for your work.

  • Share Plans. In the DMPTool, you can input the email address(es) of any collaborators you would like to invite to read or edit your plan. Set or adjust their permissions via the radio buttons and click to "Add collaborator"

  • Get feedback on your DMP from the Research Library Data Research Services Team. The Research Data Service provides free, fast, and confidential feedback on draft DMPs. We work with library-based subject experts so that our feedback incorporates disciplinary and data management expertise.

  • Revisit your data management plan throughout the project life cycle

DMP examples and templates

Guidelines for Effective Data Management (ICPSR) Includes a framework, and links to example plans
http://www.icpsr.umich.edu/icpsrweb/content/datamanagement/dmp/index.html

Documentation & Organization

Resources

SRO-RL Research Data Management Consultation

We are here to help you find, use, manage, visualize and share your data. Visit our website to learn more about our services. Schedule a consultation with a data expert. View and register for upcoming workshops.

Data Dose

Data Dose is a monthly LANL newsletter reminder about data-related events and information to help you manage and love your data, supporting the LANL Information, Science, and Technology Science Pillar. To subscribe to the mailing list, email listmanager@lanl.gov with “subscribe datadose_newsletter” in the body of your message.

Data Dose

Data Documentation

Documenting your data will help you keep track of what you’ve done with your data throughout a research project. Documentation provides context, methods, tools, and requirements for collaborators, and when you share and publish your data. Data documentation should be in an accessible format and give enough information to understand and verify the work. Data Documentation answers two of the most fundamental data questions: If someone were to look at this data in 50 years, would they be able to understand why and how it was collected? If someone wanted to reuse or repurpose my data, would they know how to replicate my findings and what software to use?

These formats will share information on:

  • Who contributed to the work (PIs, co-authors, research assistants, etc.)?
  • What data was used and for what purpose (software? simulations?)
  • When the data was collected (When was analysis performed? Any other pertinent dates?)
  • Where the data was collected from
  • Why the data was collected
  • How conclusion was reached

Documentation can take on a variety of formats, though all formats should be similar in content. All forms of documentation must include basic information about the data that allow for its correct interpretation and reuse by yourself in the future and other researchers. Common documentation formats include:

  • Readme Files which are text files that provide basic information about a dataset
  • Data Dictionaries which is often a tabular file that provides critical information about a data file by describing the names, definitions, and attributes of the data elements.
  • Codebooks with full variable and value labels are detailed files that describe and document the layout and structure of a data file. A codebook may include codes and definitions to provide context for and help analyze data. See this sample codebook, and check out the Codebook Cookbook for more information.
  • Electronic Lab Notebooks (ELNs) enable researchers to digitally organize and store a detailed record of experimental materials and procedures, protocols, results, notes, and data.

 

Metadata and File formats

Simply put, metadata is "data about data." Various metadata standards are available for particular file formats and disciplines. General guidelines are provided below.

  • File handling (naming convention, folder structure)
  • Processing steps (how to get from point A to B)
  • Protocols (what decisions were made and why)
  • Field abbreviations/name glossary (what does ABC3130 stand for)

Recommended tips for file naming:

  • Use YYYY-MM-DD format
  • Names best tolerated by the widest variety of systems use a combination of letters, numbers, underscores, and hyphens
  • Standard file extensions that indicate the type of file (ex: .txt) are helpful for machine action
  • Keep names short and consistent

More Resources

What in the world is metadata and why should I care? This blog post provides a succinct explanation of the importance of metadata

Electronic Lab Notebooks (ELNs)

An Electronic Lab Notebook (ELN) is a software tool that in its most basic form replicates an interface much like a page in a paper lab notebook.

Organization Tips

  • Software Carpentry Data Management Tips  - includes details on dealing with versioning control, structuring folder names, and directory structures

Citation & Metadata

What's an ORCID identifier?

ORCID stands for "Open Researcher and Contributor ID." LANL Researchers can sign up for an ORCID unique persistent personal identifier which enables name disambiguation and tracking of their scholarly production across name or institution changes.

How to get an ORCID iD

Getting a LANL ORCID iD is a simple process that will allow researchers the ability to distinguish themselves from others. 

What's a LA-UR?

RASSTI is a review and approval system for Scientific and Technical Information, or STI, that is a product of LANL programmatic work and intended for release outside the laboratory. It assigns digital numbers for LA-UR (Los Alamos–Unlimited Release) documents and LA-CP (Los Alamos–Controlled Publication) documents which are unclassified but controlled. The resulting documents are collected in our internal institutional repository and are made available to appropriate audiences

What's a DOI?

DOI stands for "Digital Object Identifier." When a digital resource such as a dataset or an article is assigned a DOI, that acts as its unique and persistent identification number in perpetuity.

Citation Components

There is no one standard citation style used at LANL, but in general a data citation includes all of the same components as any other citation:

  • Creator or Author
  • Title
  • Date of publication and/or data collection
  • Publisher or data repository name (where data is housed)
  • Unique identifier to allow access to information, usually a URL, system-generated id number, filename, or a persistent identifier (PID) such as an ORCID or DOI.  An example might be a LANL technical report LA-UR (Los Alamos Unlimited Release) ID -- unclassified documents which have been reviewed by SAFE-IP.

Data Citation

Metadata Standards

Metadata consist of information that characterizes data. In essence, metadata answer who, what, when, where, why, and how about every facet of the data that are being documented. A metadata standard is a high level document which establishes a uniform way of structuring and understanding data. Information on submitting appropriate metadata for LA-UR reports can be found online at: https://int.lanl.gov/library/rassti/metadata.shtml

There are many types of metadata standards/schemas. Some are generic, while others are domain-specific.

Science Schema
General Purpose Schema

This site provides links to information about disciplinary metadata standards. Take a look at the "General Standards" section if your discipline does not have a metadata standard.

Dublin Core is a general standard first used by libraries, and can be adapted for specific disciplines. Dryad, a digital data repository, uses Dublin Core.

Persistent Identifiers (PIDs)

The proliferation of digitally available research and technical publications has created a need for machine-readable, interoperable Persistent Identifier (PIDs). Machine-readable PIDs such as DOIs and ORCID iDs are valuable assets in enabling information sharing across systems and are used by funding agencies, publishers, and other organizations to enable digital connections between data, objects, contributors, and organizations.

Digital Object Identifiers (DOI) are the most widely known and used PIDs. DOIs aid in citation tracking, ensuring a researcher has accurate metrics on how and where their research outputs are being used or referenced. 

 

Examples of persistent identifiers include:

Object Identifiers

Contributor Identifiers

Organization Identifiers

Storage and Backup

Data Tips

Save your files in a non-proprietary (open) format when possible.

The format in which your files are saved affects the ability for them to be opened in the future and may lead to inconsistencies in data storage. Non-proprietary formats (i.e. .comma-delimited alues or CSV). files) are more interoperable and allow for long-term preservation and potential reuse than than proprietary format (i.e. xls files).

Back up your data regularly to avoid risk of data loss using the "rule of 3"

Data backup best practices includes making 3 copies of the data for backup with copies being geographically distributed  (local vs. remote)--- i.e. the original on your computer, external/local (hard drive or tape backup), and external/remote (online storage service). These should be performed at regular frequencies when you complete your data collection activity and after you make edits to your data.

Consider Security

Data security needs to be considered for all copies of your data, raw data, archived data and data backups. See the LANL Computer Security website for more.

Overview

One of the most frequently asked questions researchers have is where they can deposit their data.There are many different  institutional and subject-specific repositories for long-term storage of data. Grant funding agencies may have specific requirements for the repositories and long-term storage plans for data collected with their funds.

General Data Repository

LANL researchers' deposit their data in a variety of places: domain-specific data repositories, general purpose data repositories, and DOE-specific Institutional data repositories. The value of these repositories are in the availability and familiarity with researchers within the discipline and the subject specific metadata searching capabilities and ontologies.General repositories are often recommended by publishers as well as granting agencies for the deposit of data related to research studies. Some recommended open repositories include:

Compare recommended repositories to each other in the LANL Data Repositories Matrix.

Making Data Open via a Discipline-Specific Data Repository

There are many ways to share your data openly and freely in open data repositories, which can be found via re3data, a registry of research data repositories). You can search the re3data.org to find appropriate academic discipline repositories.

Below are a selection of DOE and NNSA data repositories:

  • The Open Energy Data Initiative (OEDI) is a free, searchable online software discovery platform and knowledge-sharing platform, developed by NREL, and powered by OpenEI. Sponsored by the Department of Energy, and developed by the National Renewable Energy Lab, in support of the Open Government Initiative, OpenEI strives to make energy-related data and information searchable, accessible, and useful to both people and machines  https://openei.org
  • Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center: https://daac.ornl.gov/
  • Sandia National Laboratory: https://energy.sandia.gov/programs/energy-water/data-modeling-analysis/
  • OSTI DOE CODE Repository Service https://www.osti.gov/doecode/ for links to various DOE code/software/data resources
  • National Energy Technology Laboratory https://edx.netl.doe.gov/
  • Energy Information Administration: The Energy Information Administration (EIA) is the statistical agency of the U.S. Department of Energy. Find data under Sources and Uses, next by type. More open data for download is available at https://www.eia.gov/opendata/
  • Data.gov  Public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government.
  • Nuclear Energy Agency Contains both experimental and evaluated nuclear data including nuclear reaction ((the properties of interacting nuclei, e.g. cross sections) and nuclear structure (the properties of single nuclei) data
  • U.S. Energy Information of Administrations Data Company level data on the supply and disposition of natural gas in the United States, Electric power data collected by surveys, international energy statistics, energy country profiles for 217 countries, state and territory energy profiles for the U.S., financial data collected from major energy producers, short-term and historical energy outlook data & projections, and real energy prices.
Managed by Triad National Security, LLC for the US Department of Energy's NNSA. Copyright Triad National Security, LLC. All Rights Reserved.