Overview of the technical components of GBIF

cecsve · January 11, 2023, 1:10pm

This thread will capture the questions that arose during and after the second support hour for Nodes

Watch the recording that step-by-step describes the technical components of GBIF.

gbif ecosystem overview

Question 1: Are the flags on the GBIF validator the same as the flags on GBIF.org?

As mentioned in the presentation, the validator use the same processing pipeline as GBIF.org. However, all the flags relating to the ingestion process (for example, metadata validation errors or occurrence duplicate flags) are only shown in the validator. In other words, the validator will show all the flags that will be present on GBIF.org and more.

Question 2: Does the validator have an API?

No.

Question 3: Datasets get new DOIs for major new versions, is that correct?

The publisher decides what constitutes a major version. So if a given publisher considers that there is no major change to a dataset, the DOI will remain the same.

Question 4: Can the validator identify records where coordinates are missing?

Records where coordinates are missing are not flagged because the lack of coordinates is not an issue per say. It is ok to publish records without coordinates. This means that the validator would not identify the records where coordinates are missing.

Question 5: Could there be some GBIF technical Support to help with updating IPTs?

We don’t have access to your servers so we can’t make the update for you. That being said, we are happy to make it the next topic of our next technical support hour for Nodes and make a demonstration. We can also consider arranging a call later if needed.

Question 6: Does checklistbank.org correspond to the GBIF backbone and is it related to the Catalogue of Life changes?

Checklistbank.org was developed by GBIF for the Catalogue of Life. It does more than hosting checklist, it is also a tool to compare checklists, build checklists from existing ones, etc. The Catalogue of Life (CoL) use checklistbank.org to work on the CoL taxonomy as well as the Extended Catalogue of Life, which will include more checklist sources and some modifications. GBIF right now has its own backbone taxonomy, built and maintained in house. In the future, we would like to replace the GBIF backbone by the Extended Catalogue of Life. We do not have timeline for such change.

Note that you can ask for an account on checklistbank.org to download and upload checklists.

Question 7: Can you search occurrences based on eventIDs with the GBIF API?

Yes you can do that as eventID is an argument of the occurrence API (both search and download). Here is an example: https://api.gbif.org/v1/occurrence/search?dataset_key=372ac467-9b19-4373-8a0f-85fb3a91b6ca&event_id=e14_2018-02-18

Question 8: Could you advise on the best way to model the following in Darwin Core? A study where samples correspond to several limbs of the same organisms as well as their microbiomes.

The answer here is a summary of a few things that were discussed together in the call:

You would probably want to publish the microbiome as a separate dataset.
You could try the Preservation Extension (https://rs.gbif.org/extension/ggbn/preservation.xml) with an event or occurrence core to convey the different limb preservations.
You could use the Resource Relationship extension to capture the relationships between the different components.
You could publish one occurrence per limb but use the same organismID as this allows users to get all the occurrence from one given organism in the web interface. See this example: Search
Note that the GBIF “clustering” function only checks records across datasets so it would not group records published in the same dataset.

andre · January 12, 2023, 3:05pm

Hi Annie,
Already a question form my side:
Could you clarify the ingestion path for checklists with/without occurrences?
Will they go through Hive database, Checklist Bank or both?
At the moment, CLB does not offer DOI generation, will that change soon?
Thanks,

Annie · January 13, 2023, 8:00am

Thanks Andre for raising the questions. I will leave it to @mgrosjean and @cecsve to reply.

cecsve · January 13, 2023, 10:40am

Hi Andre,

Thank you for your questions! I conferred with @trobertson and @markus to answer your questions:

Checklists always go into checklistbank. Checklists that include occurrences (i.e. those with a Taxon core and Occurrence extension in the DwC-A) will additionally be “flattened” and fed into the occurrence store which is the Hive database (and related indexes). Please note that the component on this image is the “GBIF Checklistbank” that has been in operation since around 2013 powering the /species API. As you are likely aware, we are collaborating with Catalogue of Life on an evolution of that, known as “checklistbank.org” which is maturing well. When ready this will completely replace the existing “GBIF Checklistbank” and will provide richer capabilities relating to checklist comparisons etc. Today though, things operate as they always have been.

Presuming you mean the checklistbank.org system, then yes, DOIs for projects and their releases are generated. Right now this is largely the COL Checklist only. The plan is though to also create DOIs for every changed version of each dataset in CLB to allow stable citations.

For example, the latest COL release has it’s own DOI (ChecklistBank) and each source in the release likewise has a DOI, for example the ReptileDB in COL 22.12 (The Reptile Database | COL).

I hope this answers your questions - otherwise, let us know!

DavidFichtmueller · January 13, 2023, 11:02am

I think it would be helpful to add the specific date/time and how to attend.

As for the date, I assume that would be 1. February 2023 at 16:00 CET, but also with the previous post, I haven’t seen any information on how to attend. Do I need to register via the helpdesk email or is there some other way, that I am not aware of?

Annie · January 13, 2023, 12:09pm

Hi David,

Thank you for pointing this out.

The series of events are by invitation only, through our Nodes mail list. If you by mistake are not on that list, please inform your Node Manager to contact the GBIF Secretariat (info@gbif.org) with any staff updates in the team, so we can get you formally set up on the mail list. If you however do not have an official role in the Node, but would like to attend the next Node Support hour anyway, then please send us 1-2 lines on your role and what you would like to get out of attending the session to helpdesk@gbif.org.

The themes are not in place for the upcoming support hours, as they will be based on the feedback and inputs received here on the forum and during the sessions. All dates are listed here: Resources

mgrosjean · January 13, 2023, 12:13pm

@DavidFichtmueller Keep in mind that the edited recordings are public and will all be made available here: Technical support hour for GBIF Nodes on Vimeo (there is already the video for the first session).

Topic		Replies	Views
GBIF technical support hour for Nodes Data Publishing	5	1794	April 24, 2023
GBIF's data quality workflow (GBIF technical support hour for nodes) Data Publishing NodesSupportHour	5	593	March 15, 2024
April technical support hour for GBIF nodes Data Publishing NodesSupportHour	4	789	June 26, 2023
Deep Dive: Date-related issues and flags Data Publishing NodesSupportHour	2	137	June 13, 2025
A general introduction to GBIFs technical documentation (GBIF techincal support hour for nodes) Data Publishing NodesSupportHour	5	620	April 16, 2024

Overview of the technical components of GBIF

Related topics