Join us on the 6th of August at 4:00 pm CEST for the next GBIF Technical Support Hour for Nodes where Tobias Frøslev from the Science team gives us an update on the work concerning the Metabarcording Toolkit (MDT). Tobias presented an earlier version of the tool last year, which has since then matured as part of the Metabarcoding Data Programme.
The video recording is available here: MDT - the Metabarcoding Data Toolkit on Vimeo
For any DNA or MDT-related question, you are welcome to email: dna@gbif.org
Here is a transcript of the Q&A:
I have a question about the Darwin Core Data Package (DwC DP) which is being developed and open for review in September. Ideally, it would be implemented in early 2026. Do you think it would be more useful to produce a DwC DP or a DwC Archive? It can take months before we can update our IPT, how should we work in the meantime?
The input file for the Metabarcording Data Toolkit (MDT) is actually closer to the DwC DP than it is to the DwCA format. The input file will remain the same and should remain intuitive for the data providers. The MDT will be able to generate a DwCA or a DwCDP for publishing, it shouldn’t affect the data providers. As soon as GBIF is able to index the DwC DP, it would make sense for GBIF to index DwC DP as it is a much more condensed and lighter format.
Do I understand correctly that publishers can use the MDT to register directly datasets on GBIF or they can download the archive and upload it in an IPT?
Yes. Note that publishing from the MDT has the advantage of including the Biome file at the endpoints. Users would be able to access the Biome file from a GBIF dataset. While if the dataset has been published via an IPT, no Biome file would be included. In addition to that, the MDT allows you to reinterpret and update the data directly. It would be more complicated with the IPT.
The DwCA export function of the MDT would still be available even if the DwC DP was implemented to GBIF?
Yes.
Will GBIF.org offer a DwC DP occurrence download format? If publishers switch to using DwC DP for their data, it would be difficult for users to access the data if they need to work dataset per dataset, it would be easier to have the data accessible in a general occurrence download using DwC DP.
We don’t know the answer to this question yet. Please keep an eye on the DwC DP-related news and check the DwC DP repository here. You are also welcome to write to the DWC Data Packages Helpdesk dwcdp@gbif.org.
None of the publishers that we have worked with had data compliant with the input MDT format. They often have different tables, we always need to reformat or tweak the data. Have any other country or Nodes encountered the same situation?
It is true that the formats aren’t standardised across the world. The format chosen was an average of the types of data that are generated by Metabarcording analysis. Most people working with metabarcoding data would be familiar with formats similar to the MDT input format. For example, the PhyloSeq tool generates files that are close to the MDT input file format.
If you see the same formats come regularly, please let us know. We might be able to help create converters for example.
If a publisher comes with data that don’t match the MDT input file format, would you recommend that they convert them for the MDT or work straight with Darwin Core?
We cannot make general recommendations, it is a case per case basis.