Darwin Core Data Package - A new publishing format for biodiversity data (technical support hour for GBIF nodes)

cecsve · August 15, 2025, 9:48am

The next technical support hour for GBIF nodes will be on September 3rd at 4:00 pm CEST.

The topic will be an introduction to the ongoing work for developing a new publishing format, the Darwin Core Data Package (DwC-DP) (DwC-DP Quick Referencing Guide), which will allow publishers to share more complex and detailed information than what is currently possible in a Darwin Core Archive.

The Data Products team will give an overview of the main components of the publishing model and how the GBIF Secretariat expects the new format may affect the work of the GBIF nodes when providing publisher support.

Please note that the publishing model will also be part of a three-month TDWG ratification process starting from September 1st, 2025. To learn more, see: GitHub - gbif/dwc-dp: Darwin Core Data Package schemas and for examples of mapping data to the new format: GitHub - gbif/dwc-dp-examples: Reference datasets for the DwC DP schemas.

We will be happy to answer any questions related to or not related to the topic. Please feel free to post questions in advance in this thread or write to helpdesk@gbif.org.

mgrosjean · September 15, 2025, 11:37am

The video recording is available here: Technical support hour for GBIF Nodes on Vimeo

Here is a transcript of the Q&A:

Looking back at the dinosaur example in the presentation, should we expect to have different types of values in the same fields? Like the date of a geological period in the eventDate field?

No, you would use different fields to convey a collection event, and the geological period associated with the specimen. In the example presented, we are missing the geologicalContext table. You would have two rows in the event table (one for the collection event and one for the time where the specimen likely died) but not use the same fields to convey the different types of dates. We left the eventDate blank where there is no date.

In this model, the occurrence doesn’t have to be the collecting event. It would be particularly relevant for the paleontology and archaeology communities as well as people working with eDNA.

I have a question on how extensions fit into the DwC Data Package (DwC-DP). For example, we have a dataset about plant traits and had to use a lot of extensions to model the data with the Darwin Core Archive model. The new model seems more flexible and better suited for this type of data. I wanted to know if I will still be using extensions when modeling the data to the DwC-DP.

You can keep working with extensions in Darwin Core archives. If you wanted to use the Darwin Core data package, you would have to find an equivalent in the tables available for the data package. Just like the DwC-A, the DwC-DP allows you to map tables which go beyond those for which we have created schemas. However, we don’t have a mechanism to register those for reuse with others at this point.

The quick reference guide would probably be useful for you to help you find the relevant tables: https://gbif.github.io/dwc-dp/qrg/

It would be helpful if you could try mapping your dataset to DwC-DP during the review period. If during the exercise, you find it difficult to map some of the information captured in the extension to the DwC-DP, you would be able to highlight the issue during the review.

We believe the DwC-DP should be able to support the same capabilities as DwC Archives and their extensions for occurrence and sampling event datasets. What have been known as checklist datasets to date (i.e. Taxon core) aren’t currently supported by the DwC-DP.

There isn’t taxon core in DwC-DP because taxon isn’t settled in our community at the moment as we have to review ColDP and TCS2. Once these standards are reconciled, we can start working on bringing them in DwC-DP.

Which DwC-DP terms will be indexed on GBIF.org?

It is too early to answer this. It is very likely that all the existing terms that are searchable in the current API and web interface (and their equivalent from DwC-DP) would remain searchable. We don’t know about the other fields and terms.

If you would like to know more, please join the Datos Vivos section https://www.livingdata2025.com/program.html?session=6798937-2_2025-10-21_Room+A where Federico Mendez will present the roadmap for the implementation of the DwC-DP in the GBIF infrastructure.

Do you know how many terms are in the DwC-DP?

When this public review comes out, we expect to have 50 new terms which appear in several places.

If you count every possible slot of every field in every table in the DwC-DP schemas, there would be 11,192.

The question is really about the type of data one has. For example, if you don’t have survey data, you can ignore several tables in the DwC-DP. Another example, if you don’t have any agents, you can ignore 40 tables.

One challenge for the implementation of change in the IPT is to make it easy for people to decide what they have and don’t have.

Should we start thinking ahead of material translation and making training materials?

Please note that all the material that you currently have for supporting data publishers will still be valid. The new model offers additional flexibility, but the current workflow will remain.

Don’t start updating the new model yet. The model needs to be ratified first. The GBIF Secretariat will also produce documentation and training.

We anticipate that it will take time, and most likely some years before the DwC-DP is fully adopted; there will be early adopters emerging in the near future.

Note that the MDT produces DwC-DP exports. A lot of the datasets produced in the first phase of adoption of DwC-DP will likely not come from the IPT nor manually mapped dataset but from tools like the MDT or collection management systems. Such tools could implement the DwC-DP as an export format for richer data. People wouldn’t be involved in the direct mapping; it would be something implemented in the data exports of such tools.

This will be a first phase of adoption while we work on implementing features in the IPT to facilitate the mapping experience for users.

I mapped a very old dataset to the DwC-DP. I noticed that many fields were not recorded at the time because there wasn’t a way to publish them at the time.

For upcoming expeditions, I would like to ask people to at least record those fields so they can publish the information in the DwC-DP. I can implement this myself as I am the data manager for these expeditions.

One of the questions I get from researchers is where to put their sample identifiers. There is not an ideal place in DwC archive and shoehorn everything in occurrence isn’t ideal. We are looking forward to adopting DwC-DP.

Thank you for sharing your experience.

Will the datasets published in the context of BID projects be able to use the DwC-DP?

If the DwC-DP is implemented by the time the BID projects start sharing their data, they would be able to use the new standard. The BID training might not include anything detailed about the DwC-DP. The timeline is unclear at the moment, as it will depend on the public review and the progress, we make in adding the capabilities to the tools.

If DwC-DP isn’t included in the training for BID, the BID support team should still be able to help BID participants map their data.

Should we start using the DwC-DP and converting DwC-A to DwC-DP as soon as the ratification happen?

The DwC-DP implements event hierarchy the same way as the DwC-A so there should be little friction to go from DwC-A to DwC-DP.

There will be tools to help converting datasets from DwC-A to DwC-DP.

You should consider using DwC-DP if you have additional data to share (such as survey scope for example) or if you would like to represent your data better.

Note that publishing tools (like the MDT or the IPT) might have the option to generate DwC-DP automatically.

Most of our datasets are checklists and ColDP didn’t seem like a better alternative to DwC archives, so we kept the Taxon core. Would we eventually be able to do everything in DwC-DP?

It is too early to answer this.

In order to be able to publish checklists with the DwC-DP, ColDP and TSC2 should be reconciled, and we should have well-defined use cases that would be solved with the implementation of checklist in DwC-DP.

Note that ColDP now supports the distribution extension. See: https://github.com/CatalogueOfLife/coldp/issues/81

Is EML the standard for metadata in the DwC-DP?

Yes, EML is still used to format metadata in the DwC-DP.

Topic		Replies	Views
Darwin Core Data Package (DwC-DP) Data Publishing	0	209	May 3, 2025
MDT- the Metabarcoding Data Toolkit (Technical Support Hour for Nodes) Data Publishing NodesSupportHour	1	98	August 22, 2025
Diversifying the GBIF data model - intro Diversifying the GBIF data model	14	1278	July 21, 2022
The new features of IPT Version 3.0 (GBIF technical support hour for Nodes) Data Publishing NodesSupportHour	5	783	February 29, 2024
Publication of Checklists Diversifying the GBIF data model	2	529	March 18, 2023

Darwin Core Data Package - A new publishing format for biodiversity data (technical support hour for GBIF nodes)

Related topics