Data collection, management and sharing

Updated September 2019

 

Data collection practices

(a) If you are leading a project you must keep a field notebook in which you outline your observations and activities.  These are available in the McEwan lab.

data collection needs to be clean, legible to others, in pencil, on waterproof paper, with dates and data collector information on each sheet.  Data collected on paper in this fashion is timeless and superior to digital methods.

(c) When possible there should be uniformity in data collection system.  Preferably have the same person collect the same data in the same way for the whole project and deviate from that only if needed and with caution.

(d) QA/QC your data entry.  Errors are quite common in data collection and data entry.  You need to practice regular QA/QC.  Best is a quick scan of each sheet in the field (or lab) as the sheet is completed to see if anything absurd is written down (oak tree with 8900 cm DBH).  During data entry is a good time to catch mistakes. Then after data entry, depending on the number of sheets, it is useful to go back and re-check the entry.  Minimally, a random 10% sample of the data sheets and re-check with the entered data is needed.  If mistakes are found, then another random 10% sample (without replacement) is needed.

 

Precautions against the loss of data

Computers are only vaguely reliable instruments. 

You are required to assume that your computer, or the one you are using, will fail unexpectedly.

Examples of **unacceptable** reasons for losing data:

(a) my hard drive crashed!  (of course it did!)

(b) I was really busy getting ready for “x” so I didn’t back up the data

(c) there is only one computer with a microscope camera/software package/etc and so I saved everything there and didn’t back it up.

 

McEwan lab data management practices & data sharing

Here are some rules for working in the McEwan lab:

(a) Data ownership.

At the time of graduation, or the ending of any particular project, all data and other information associated with the project must be transferred to Dr. McEwan for curation, storage, publication or sharing.

Students may be entitled to authorship on subsequent publications; however, that is determined on an individual basis depending on the efforts of the students within the context of the overall project.

As Principle Investigator in the lab, and the person responsible to the University of Dayton, funding agencies, etc, Dr. McEwan is the ultimate owner of all data collected in the lab and reserves the right to make datasets publicly available and move forward with publishing or other uses pursuant to the code of ethics surrounding scientific information put forward by the Ecological Society of America.

(b) Everyone will use Excel to enter data and create CSV files and all analysis must be done in R.

(c) We will share data sets across the lab

(d) We will create data products (data sets) that are shared publicly

(e) Everyone will help out with analysis- share in a collaborative fashion

(f) Analysis of data sets will often be open to peers, and, in some instances, will take place live in front of the lab group.  This might feel scary at times if it is your data set, but this is the way we are moving forward

Process:

(a) Every new project in the lab should begin with the creation of a folder that is shared.  Dr. McEwan will create that folder and share it with the participants in the project.  If you are involved in a project and do not have a folder, let Dr. McEwan know via email or in person and he will create the folder.  All activities related to the project must take place within this shared folder.  The shared folder should have a logical sub-folder structure and can include “Literature”  “Analysis,” “Writing,” “Scripts,”  etc.

(b) Data products.  The first priority for any project, once data have been collected is to create data products for the project they are working on.  These data products will be shared in the Project Folder and consist of a written description of the methods,  data entered into the appropriate format for sharing and analysis in R, and explanatory text in the form of meta-data.

Features of a required data product

–  a journal-quality description of methods.  This may need to include images, etc.

–  a Final CSV folder that contains the perfectly cleaned and organized R-ready files

–  metadata for the CSV files

(c)  Projects will advance forward, following the completion of the data products, following normal processes including exploratory data analysis, final analyses and writing.  This could include writing a Thesis, a paper for publication, or preparing various presentations including posters and presentations.

(d) Once analyses are completed, the final, curated, annotated, scripts must be shared with Dr. McEwan.  These scripts may serve as the basis for future analyses.  To determine if you have met this lab requirement – Can a reasonably competent person, familiar with your project, reproduce your graphics and analyses in < 1 hr?

 

McEwan lab data archive

The McEwan lab seeks to publicly share all data collected in the lab through eCommons and the University of Dayton library. The idea is to store curated data set on the site in such a way that they are publicly available and have a citable DOI.  We will seek to publish the final, cleaned, csv files prior to publication and cite the data in the manuscripts we submit to journals.  This is a permanent archive freely available to all collaborators, agencies and other stakeholders.  Dr. McEwan will contact you about storing your data sets in the archive if appropriate.

LINK: https://ecommons.udayton.edu/mcewanlab/

 

 

Leave a comment