Invitation to share DNA metabarcoding data to test early pilot of data-publishing tool

Metabarcoding of environmental DNA samples or bulk samples is one of the major sources of new biodiversity data, and GBIF is exploring ways to expand its support of communities interested in publishing DNA-derived biodiversity data and increase its visibility and reuse beyond molecular repositories and archives.

On the occasion of updating our guide on sharing such data through biodiversity platforms—which now includes a special section on publishing marine eDNA data—GBIF invites people who hold DNA metabarcoding data to help us pilot an experimental data publishing tool that responds to recent feedback from the omics community. .

Between now and 1 November 2023, we invite volunteers to get in touch and send processed DNA metabarcoding data to GBIF’s DNA pilot team at Ideally, these metabarcoding / eDNA datasets will come directly from people with a firm understanding of their data structure and origins (more on data expectations below). Based on demand, we may prioritize the processing of candidate datasets from GBIF participant countries or those that address spatial and taxonomic knowledge gaps. We will use the contributed datasets to test and refine the experimental tool and develop guidelines around its future use. In addition, we will work through GBIF’s data publishing processes to ensure any contributed datasets are published to GBIF as an outcome of this pilot.

Why share DNA metabarcoding data through

  • The data will be reused for research and policymaking, increasing impact on scientific knowledge and understanding of global biodiversity
  • GBIF’s citation tracking system records every reuse of data in scientific publications
  • Data shared through provides researchers and other data users with an additional route for discovering the underlying scientific work and publications
  • Data originators manage academic credit and attribution and control both the frequency and scale of edits and updates
  • GBIF assigns every dataset with a Digital Object Identifier (DOI), a persistent identifier and link to the dataset

What to do if you’re interested

Consider which of your metabarcoding DNA dataset(s) could and should be discoverable for broader audiences. Send the data to, and we will contact you if we need more information or clarification. Once we process datasets, we will provide you with a preview of the data for review and adjustment.

Data expectations

It is expected that pilot metabarcoding / eDNA datasets would come directly from those who understand the data structure and its origins best and personally. Ideally, the following data elements comprise a starting point, either as separate or as merged tables, Excel spreadsheets or tab/csv delimited text:

  • OTU/ASV table (mandatory): an abundance table or species to site matrix with the number of sequence reads of each of the “molecular species” detected in each of the samples (or sites). Use sample IDs as column headers, and OTU IDs as row names, or visa/versa. The sequences can be used as OTU IDs
  • sample/site data (mandatory): a table with metadata for each of the samples (or sites). The geographical position of each sample is required (preferably as decimal latitude and decimal longitude), and the sample IDs should correspond to those in the OTU table
  • Taxonomy table (optional): a table with information on each OTU. At least the sequence of the OTU is needed if it is not used as an OTU ID itself. If taxonomy has been assigned to the OTUs, then the inferred scientific name of each OTU can be provided.
  • Study data (recommended): some means of information on the study / data generation OR a link to a publication or other source where the information (marker/gene, sequencing platform, samples type, etc…) can be extracted from.

Examples of eDNA metabarcoding datasets at