The Data Life Cycle

This process starts as soon as you begin planning the other parts of your project, often as early as your research proposal. It occurs in 5 phases which will likely happen somewhat simultaneously or more than once.

  1. PLAN
  2. COLLECT
  3. ASSURE
  4. DESCRIBE
  5. PRESERVE

As a scientist, you will likely analyze your published data. These steps make sure that you are not the only one who can do so.

PLAN: create a data management plan (DMP)

This could be a formal proposal or just a guideline for you and your team. Essentially, you are outlining your plans for the next 4 steps of the cycle. Review those steps first before creating a DMP. Regularly update your DMP throughout the process - it will provide most of your metadata.

More resources:

COLLECT: make observations and gather data

Your goal here is to produce clean, reusable data. More organization now means less work later.

ASSURE: quality assurance / quality control (QA/QC)

Determine what quality standards you will hold your data to (see options below). Do basic quality assurance and control during data collection, entry, and analysis. The cleaner your data collection is, the easier this will be.

More resources:

DESCRIBE: gather metadata

Metadata - data about your data - is essential for future comprehension and repeatability of your work. Utilize the metadata format corresponding to the field that your research best fits into. Note that almost all formats/tools for enviornmental data result in a .XML (extensible markup language) file. This is standard - the .XML file should be considered with as much importance as your data table files.

PRESERVE: share your data

Your data isn’t very useful if only you can see it. Use technology to make your dataset accessible, reusable, and repeatable. After metadata is compiled and you don’t plan on making any more changes to your files, it’s time to preserve your data. Your work may provide data that the scientific community desperately needs in the future. Imagine if virologists hadn’t saved their 2019 data on coronaviruses…


Source: DataONE (Data Observation Network for Earth) Data Management Primer

<– Back