Invitation to share DNA metabarcoding data to test early pilot of data-publishing tool

tfroeslev · June 7, 2023, 1:44pm

Metabarcoding of environmental DNA samples or bulk samples is one of the major sources of new biodiversity data, and GBIF is exploring ways to expand its support of communities interested in publishing DNA-derived biodiversity data and increase its visibility and reuse beyond molecular repositories and archives.

On the occasion of updating our guide on sharing such data through biodiversity platforms—which now includes a special section on publishing marine eDNA data—GBIF invites people who hold DNA metabarcoding data to help us pilot an experimental data publishing tool that responds to recent feedback from the omics community. .

Between now and 1 November 2023, we invite volunteers to get in touch and send processed DNA metabarcoding data to GBIF’s DNA pilot team at DNA@gbif.org. Ideally, these metabarcoding / eDNA datasets will come directly from people with a firm understanding of their data structure and origins (more on data expectations below). Based on demand, we may prioritize the processing of candidate datasets from GBIF participant countries or those that address spatial and taxonomic knowledge gaps. We will use the contributed datasets to test and refine the experimental tool and develop guidelines around its future use. In addition, we will work through GBIF’s data publishing processes to ensure any contributed datasets are published to GBIF as an outcome of this pilot.

Why share DNA metabarcoding data through GBIF.org

The data will be reused for research and policymaking, increasing impact on scientific knowledge and understanding of global biodiversity
GBIF’s citation tracking system records every reuse of data in scientific publications
Data shared through GBIF.org provides researchers and other data users with an additional route for discovering the underlying scientific work and publications
Data originators manage academic credit and attribution and control both the frequency and scale of edits and updates
GBIF assigns every dataset with a Digital Object Identifier (DOI), a persistent identifier and link to the dataset

What to do if you’re interested

Consider which of your metabarcoding DNA dataset(s) could and should be discoverable for broader audiences. Send the data to dna@gbif.org, and we will contact you if we need more information or clarification. Once we process datasets, we will provide you with a preview of the data for review and adjustment.

Data expectations

It is expected that pilot metabarcoding / eDNA datasets would come directly from those who understand the data structure and its origins best and personally. Ideally, the following data elements comprise a starting point, either as separate or as merged tables, Excel spreadsheets or tab/csv delimited text:

OTU/ASV table (mandatory): an abundance table or species to site matrix with the number of sequence reads of each of the “molecular species” detected in each of the samples (or sites). Use sample IDs as column headers, and OTU IDs as row names, or visa/versa. The sequences can be used as OTU IDs
sample/site data (mandatory): a table with metadata for each of the samples (or sites). The geographical position of each sample is required (preferably as decimal latitude and decimal longitude), and the sample IDs should correspond to those in the OTU table
Taxonomy table (optional): a table with information on each OTU. At least the sequence of the OTU is needed if it is not used as an OTU ID itself. If taxonomy has been assigned to the OTUs, then the inferred scientific name of each OTU can be provided.
Study data (recommended): some means of information on the study / data generation OR a link to a publication or other source where the information (marker/gene, sequencing platform, samples type, etc…) can be extracted from.

Examples of eDNA metabarcoding datasets at GBIF.org

United States Geological Survey: 18S Monterey Bay Time Series: an eDNA data set from Monterey Bay, California, including years 2006, 2013 - 2016 18S Monterey Bay Time Series: an eDNA data set from Monterey Bay, California, including years 2006, 2013 - 2016 accessed via GBIF.org on 2023-06-07.
PlutoF. Global soil organisms. Occurrence dataset Global soil organisms accessed via GBIF.org on 2023-06-07.
Atlas of Living Australia (2021). DNA metabarcoding assays reveal a diverse prey assemblage for Mobula rays in the Bohol Sea, Philippines. Occurrence dataset DNA metabarcoding assays reveal a diverse prey assemblage for Mobula rays in the Bohol Sea, Philippines accessed via GBIF.org on 2023-06-07.

Topic		Replies	Views
🎉 GBIF is proud to launch the pilot phase of the ActivityPub test	0	19	October 1, 2024
Looking for an in-gene-ious 🧬 solution to streamlining ActivityPub test	0	16	October 15, 2024
Unlock your metabarcoding data's potential at the ActivityPub test	0	26	July 18, 2024
Why share your metabarcoding data? Enhance visibility, ActivityPub test	0	14	July 30, 2024
Which data can be shared through GBIF and what cannot - GBIF Data Blog data-blog	1	697	November 17, 2022

Invitation to share DNA metabarcoding data to test early pilot of data-publishing tool

Related topics