Publishing marine data to OBIS and GBIF - Technical Support Hour for Nodes

Join us on December 3rd at 4:00 pm CET for the technical node support hour on publishing marine data to the Ocean Biodiversity Information System (OBIS) and GBIF. Elizabeth Lawrence will join us from the OBIS Secretariat, and the presentation will cover relevant resources - such as the OBIS manual, OBIS data quality requirements, and controlled vocabularies.

We will be happy to answer any questions related to or not related to the topic. Please feel free to post questions in advance in this thread or write to helpdesk@gbif.org.

2 Likes

The video recording is available here: Technical support hour for GBIF Nodes on Vimeo

Here is a transcript of the Q&A:

Why make data available both on GBIF and OBIS?

OBIS is marine-focused and reaches different communities, so pulling GBIF’s marine datasets into
OBIS (and vice versa) helps build a more complete picture of biodiversity in the marine environment across both networks.

Some users and services interested only in marine data tend to go directly to OBIS and may never search GBIF.
When done via IPT and networks, the data stay in one place (the same IPT), but are exposed in multiple systems (GBIF and OBIS).
This avoids duplication, maximizes visibility and strengthens collaboration between GBIF and OBIS.


When you get a request to harvest new datasets on OBIS, how do you handle those that don’t fulfill the OBIS data quality requirements?

When a GBIF dataset is added to the OBIS network in IPT, OBIS automatically creates a GitHub issue in the OBIS network datasets repository. An OBIS node (often the regional/national node) is then responsible for reviewing and endorsing the dataset.

The node checks the dataset against OBIS requirements and expectations. If everything looks fine, the node endorses the dataset, and it is ingested into OBIS.

If quality problems are found nodes identify critical issues and usually contact the data publisher (typically by email) with specific feedback and requests for corrections.

Note that nodes may have different thresholds for what they accept. A dataset one node considers unacceptable might be fine for another, reflecting network diversity. This is part of why OBIS is interested in the upcoming TDWG biodiversity data quality standard, to make expectations more consistent.


Here at GBIF Norway, we are also the OBIS Node. We detected some data quality issues that would have gone unnoticed if it wasn’t for the OBIS data quality requiements. For example, sometimes people confuse decimal degrees and minutes, which leads to data points being on land. This is an issue that we encountered several times now. We usually email data publishers when we notice.

To comment on this, OBIS flags records on land. Note that bathymetry layer that we use has a coarser resolution than the real world. So sometimes records can be flagged as “on land” even though they are coastal.

It is interesting to hear that the flag also helps you identify this type of coordinate issue.


We are a registered GBIF publisher. Do we have to register at OBIS as well? Or to a OBIS node?

OBIS doesn’t have ‘publishers’ like GBIF does. That is a piece of metadata that lives in the GBIF registry. But, if your publishing endpoint (e.g. IPT) is strongly associated with a national or thematic node and you intend to publish to OBIS often, we could whitelist the IPT so every dataset is automatically harvested by OBIS.


Is there a quick way to know which IPTs registered on GBIF are also OBIS IPTs?

OBIS doesn’t have a public registry so you would have to write to the OBIS helpdesk to ask.

Note that an IPT whitelisted by OBIS doesn’t mean it is hosted by OBIS. It means that all the content of that IPT is meant to be shared with OBIS. As opposed to other IPTs that might only have some datasets that are meant to be shared with OBIS (the whole IPT wouldn’t be whitelisted).


How do you know that you are on an OBIS IPT? Do the IPTs have the OBIS logo?

There isn’t one way to recognize the OBIS IPTs (even though the information might be available in the IPT “about” page).


What if an IPT hosts both marine and non-marine datasets, how does it work?

It isn’t a problem for a whitelisted IPT to also contain non-marine datasets. All the non-marine species (according to the World Register of Marine Species - WoRMS) are filtered out of OBIS.

—
Are the AphiaIDs not important anymore?

The integration with OBIS is easier if you already provide the AphiaID.

Providing the Catalogue of Life identifiers should also be an option if they have a corresponding WoRMS entry.

Note that https://www.checklistbank.org has a service to compare to checklists or to match a set of names to a reference checklist. Both the Catalogue of Life and WoRMS are checklists available on checklistbank. So you can use these tools to obtain AphiaIDs for examples. Learn more about checklistbank here: ChecklistBank tutorial.


We have been having issues with obtaining data using the occ_download function with our current log-in information and can only get data using the less comprehensive occ_search function.

I’m wondering if someone on the GBIF team can point me in the right direction for obtaining access to the occ_download function data or provide some information on how to proceed.

Do you receive any error message? Please write to helpdesk@gbif.org and send us the query that you made. Thank you!

Note that you can only have three downloads being generated at the same time.


When will the GBIF Backbone taxonomy be updated?

The upcoming version of the GBIF portal (presented during this previous technical support hour for nodes: Technical support hour for GBIF Nodes on Vimeo) uses a new backbone taxonomy for its occurrence search.

This portal is not in production but available for testing.

See an example here: https://demo.gbif.org/occurrence/search?occurrenceStatus=PRESENT&taxonKey=69G8&checklistKey=7ddf754f-d193-4cc9-b351-99906754a03b

The new taxonomy is also available for browsing and downloading on checklistbank: https://www.checklistbank.org/dataset/313531/about.

In addition to that, rgbif has been updated to enable occurrence searches based on this new backbone.

The new portal is still in the testing phased and until it is in production, searching occurrences based on the new taxonomy won’t be possible on GBIF.org.


When will the new portal go live?

We hope early 2026 but it is difficult to give a precise estimate when in testing. If you would like to help, please check https://demo.gbif.org and report issues.


I would like to share our experience during our latest data mobilization workshop. We used chatIPT (https://chatipt.svc.gbif.no) and found that it worked better for eDNA data than the MDT (Metabarcoding Data Toolkit https://mdt.gbif.org).

Thank you for sharing.


One database that we have encountered is FISHGLOB – Fish biodiversity under global change which have their data on Zenodo. Has everyone worked with that databased before?

Response from one of the other participants:

Amongst other things, FISHGLOB is a group of scientists who are connected to fishery trawl data which historically have been very heterogenous due to several reasons, including the use of different methodologies.

They are working on producing high quality data which can be used to model fisheries, stock assessments, etc.

They have good insights on who the players are and why the data were collected a certain way.

One of the developers of FISHGLOB is also one of the developers of the Humbolt Extension. There is a real effort to describe inventories in surveys more explicitly.

Fishery data can be sensitive and data providers have a strong sense of ownership of these data.

It is important for fishery data owners that the data are shared in a way which makes the limitation associated with the data explicit and clear. This is to avoid using the data for misinformation and avoid litigation.

That’s one of the things that the Humbolt extension is helping to solve.

FISHGLOB aims to format the data to the DwC standard and share them with OBIS and GBIF. In the meantime, they have shared the data on Zenodo to make them accessible to all.


Thank you, I have a follow up question. We in the Norwegian Node would be able to publish the Norwegian part of FISHGLOB. However, we cannot use the Zenodo DOI as the Norwegian data is only a subset of the whole dataset.

From participant:

Good to know, let’s get in touch with FISHGLOB and discuss it together.


Does FISHGLOB maintain a catalogue of fish species?

I don’t think so. They are using resources like FishBase but they work on assessing fish population.

1 Like