Switching GBIF’s taxonomic backbone to the Catalogue of Life extended release (x-release) (GBIF technical support hour for nodes)

cecsve · October 14, 2024, 10:26am

Join us for the next session of the technical support hour for GBIF nodes on November 6th, 2024, at 4 pm CET (please note we are switching back to standard time), where the topic is “switching GBIF’s taxonomic backbone to the Catalogue of Life extended release (x-release)”.

Catalogue of Life is preparing an extended version of the Catalogue of Life Checklist which integrates additional taxonomic and nomenclatural data sources. This includes recently described species, additional synonyms, as well as other names originating from molecular and sequence registers. In this session, we are joined by Diana Hernández and Camila Plata from COL who will describe the progress and challenges they faced when constructing xrelease. They will also show the provenance of the selected taxonomic data sources and their relevance for closing data gaps to better represent endemic species, synonyms, and less-known taxa.

We will be happy to answer any question relating or not to the topic. Please feel free to post questions in advance in this thread or write to helpdesk@gbif.org.

larsgw · October 16, 2024, 10:49am

Is it possible to join this session if I am associated with a node?

cecsve · October 16, 2024, 11:25am

Yes, but we usually leave it to the discretion of the node to invite people associated with them. Could you coordinate with @nl-bif?

larsgw · November 4, 2024, 12:35pm

I see my typo now, I meant to write “not associated”.

cecsve · November 5, 2024, 1:41pm

I see. It is unfortunately not possible to join if you are not affiliated with a node. However, the recording of the presentation will be shared after the session and you are welcome to post any questions you may have here or to helpdesk@gbif.org.

cecsve · November 13, 2024, 11:08am

Here is the video recording for the session: Switching GBIF’s taxonomic backbone to the Catalogue of Life extended release (x-release)

Here is the transcript of the Q&A:

Can associate participants apply for the Metabarcoding Data Programme?

Yes.

If a checklist includes national red list categories, is it included when viewed in Checklistbank?

No, we are not including that data in the xrelease.

How do you manage the differences between higher taxonomy from global checklist and regional/local checklist?

From the red list’s we are adding some names and synonyms or complimentary information that is lacking in COL, but no threatened categories are added. The xrelease is only adding names from family and below. So, we don’t modify the higher classification provided by the global sources

Is there a summary infographic (e.g. a Sankey diagram) that we could share with people to help them understand the xCOL creation process? And a prerelease of infographic up until the final step (building into GBIF) would that be feasible?

COL has not prepared anything yet but will publish one eventually.

AlgaeBase is integrated into WoRMS and WoRMS is a checklist used for the GBIF backbone, but AlgaeBase data is excluded from the checklist shared with GBIF. Why - is it due to copyright?

AlgaeBase has a very clear license and only shares the full dataset with WoRMs. Only genus level is added in the WoRMs checklist to GBIF, not species level.

Could others generate an algae checklist that covers the same information as AlgaeBase and share that as a checklist with GBIF and COL?

You could not download the data from AlgaeBase and share this with GBIF as this is against copyright law. But you could curate your own checklist based on other sources and share this, even though it would look identical or close to identical to information found in AlgaeBase.

Is COL talking with SILVA and PR2 to integrate those resources as checklists? Can others engage with them to include the reference database?

COL is currently testing PR2 and if you have any input on how to approach other reference databases, please contact dna@gbif.org.

Is the DwC term license mandatory at a record level or do you only need to declare it in the dataset metadata?

The license declared in the metadata of the dataset is inherited at record level when data is ingested into GBIF, so the license does not need to be declared at record level.

Can you really license data or license information? Is there any example of publishers or nodes who have tried to enforce it?

The rationale behind implementing licensing in GBIF was to give users a way to know how publishers would like their data to be used and acknowledged. GBIF has stated from the beginning that the Secretariat does not police or enforce how the licenses are used. We are not aware of cases where publishers tried to enforce the license with data users, and we usually recommend against using CC-BY-NC unless it is very important since it limits the use of data rather than protecting the data.

COL want to have a long-term relationship with the checklist publishers, so they strictly adhere to the licensing.

Are there any plans to enable downloads of checklists in Cheklistbank in a de-normalized format?

Currently, there are no plans, but please see the following comment and associated discussion: Add flat classification to exports · Issue #1080 · CatalogueOfLife/backend · GitHub.

Will it be possible to search on COL taxonID’s in GBIF once the xrelease replaces the backbone? It would be nice to have persistent identifiers from COL searchable in GBIF.

The identifiers will differ between the species keys in GBIF and the taxon identifiers in xrelease so the plan is to make both available in a transition period when GBIF switches to the xrelease and then eventually only have the identifiers as they are in COL, so the two should be aligned. COL identifiers are stable to the name like in the backbone and will be searchable on GBIF eventually.

About the 50,000 digitized records added in the xrelease - are these from new sources? How much do these overlap over those already in Jstor or other global platforms? Have they and restrictions if it is the case?

The 50,000 digitized records are digitized papers from PLAZI and include prioritized data on insects, geographical areas, and specific journals. COL does not know how this overlaps with records in Jstor. There is no restriction on those resources as they are published as CC0.

Records published by PLAZI are expected to increase with every xrelease.

Do regional or national checklists contribute with more data in the xrelease than they did in base COL?

It is not the majority of names in the xrelease but they add information here and there and contribute to synonyms as well. The main checklist used is TaxRef.

If you have information on local vernacular names you want to share, how do you best share this with COL? The vernacular information feels a bit untethered so I am not sure how to add that to a scientifcName based on my own knowledge. Should I share such names with a group-specific database, or could I just share a checklist myself?

A good example of vernacular names included in Checklists is, for example, FishBase (GBIF Checklist dataset).

You can either share the information with the group-specific database, but you can also share it as a checklist to upload to Checklistbank and then COL prioritize what is included or not. It is always good to include bibliographic references for the names you include.

Here is an example of how the information could look like for a species when shared with GBIF in a checklist (see also the verbatim tab for all values shared).

All checklists published to GBIF are automatically shared with COL via Checklistbank. But all Checklistbank checklists are not automatically shared with GBIF.

Will there be any changes to the occurrence API when the xrelease is implemented? Does it change the way we work?

In the transition period, it should not affect the way you work since it is only possible to query occurrences by species keys in the v1 API. GBIF will keep the current keys and whether new identifiers are created or not, the plan is to provide a resolution service that will redirect the old keys to the corresponding new identifiers.

Eventually, it will be possible to query based on COL identifiers, but this requires changes to the API.

How many datasets are currently published with camtrapDP? And can we have a node support hour where we see a demonstration of how to publish a camtrap dataset?

There are currently three datasets published with camtrapDP:

With a story map describing the data and project behind those datasets.

We will have a node support hour on camtrapDP publishing in the new year where we go through the publishing process from start to finish. There is also a camtrapDP guide: Best Practices for Managing and Publishing Camera Trap Data you may find helpful.

Will the species keys on GBIF be replaced because you want to reuse the identifiers from other sources? Or at least will the identifiers change?

As mentioned above, GBIF will keep the current backbone keys. Whether new identifiers are created or not, the plan is to provide a resolution service that will redirect the old keys to the corresponding new identifiers. Please be aware that the old backbone and keys will no longer be maintained.

Eventually, it will be possible to query based on COL identifiers, but this requires changes to the API.

Topic		Replies	Views
Investigating taxonomic issues on GBIF.org Data Publishing NodesSupportHour	6	372	February 13, 2025
How do I get new taxa added? Data Publishing	2	747	June 20, 2021
Contributing checklists from identification keys Data Publishing	8	574	June 26, 2023
Overview of the technical components of GBIF Data Publishing NodesSupportHour	7	1143	February 12, 2023
GBIF checklist datasets and data gaps - GBIF Data Blog Data blog	6	4517	December 14, 2023

Switching GBIF’s taxonomic backbone to the Catalogue of Life extended release (x-release) (GBIF technical support hour for nodes)

Related topics