The new features of IPT Version 3.0 (GBIF technical support hour for Nodes)

In the December session, we will explain the new features of Integrated Publishing Toolkit Version 3.0 (IPT3). For some background information, you can read our release information here. Mikhail Podolskiy from the Informatics team will join the session. We will be happy to answer any question relating or not to the topic.

The next session is on December the 6th at 4pm CET (UTC+1).

The invitation with registration link will be sent to the GBIF Nodes. If you are interested in attending, you can reach out to your local node.

The edited recording and the transcript of the questions will be made available here.

2 Likes

The video is available here: New features of the Integrated Publishing Toolkit version 3.0 (IPT3) on Vimeo

Here is the transcript of the questions during the session.

Is there anyway to apply the JSON metadata format to the DWCA? This seems more intuitive than using the XML files?
CameraTrapDP is based on the frictionless data package standards. JSON files structure frictionless data. What we do now is that we convert all those frictionless data packages to Darwin Core Archives. In the future, we would like to have a dedicated ingestion for frictionless.
You cannot upload metadata in JSON for any dataset, only for the frictionless data package datasets.

Do you plan to keep supporting datasets already published with the star schema?
Yes, what is already supported in the IPT 2 will keep being supported.

I’d be interested in json files and/or Frictionless DP example packages if you have some.
You can find information about the CamtrapDP here https://camtrap-dp.tdwg.org/metadata/. See also this example: https://ipt3.gbif-uat.org/resource?r=mica-full-dataset.

Where can one test the IPT3?
You can download and install your own version. You can also contact helpdesk@gbif.org and we can give you an account on: https://ipt3.gbif-uat.org. Note that the IPT available at that address is used for development testing and you might encounter breaking changes every now and then.

I was reading through the IPT 3 release information ( Releases :: GBIF IPT User Manual (UAT)**) and I noticed that the IPT 3 requires Tomcat 9 (Servlet 4.0) or later. Is this version accessible to people who don’t have cutting edge resources?**
There is not a big difference with the current version of the IPT 2.7 (since it only supports TomCat 7 or later).

How difficult is to migrate to tomcat 9?
It depends on your system. Please refer to the TomCat documentation: Apache Tomcat® - Apache Tomcat 9 Software Downloads.

Could we have a manual for making the update to the IPT3?
Please check the instructions here: Release Notes :: GBIF IPT User Manual (UAT). We aim to improve the instructions with the feedback we receive, and would consider writing a more detailed tutorial if needed.

My understanding is that every new data model will have a new data package. Is that the case?
The data package approach is going hand to hand with the new data model. The idea is that we want to make it as easy as possible for data publishing communities to recognise themselves in their specific data needs. There are still going to be Darwin Core Archive just as before, not everybody has to switch to a new data package.
Some types of data need to be shoe horned into the current Darwin Core Archive. For some communities, the data packages are a way to share data in a more intuitive way (something recognized by the thematic communities).
Note that we are still in the early stages of this approach. CameraTrapDP is the first one and we don’t have a data package for interaction data as the model is in the design phase. We are working with the communities to develop that approach as they know their data best and can best find a way to model and format them.

When do we expect the various data packages to be available for publishing? What is the general timeline for new Data Packages?
We don’t have specific dates. Some data packages are available in the IPT 3 for testing but the development of data packages varies a lot and depend on specific communities. We don’t have a general timeline.
If you know of any on going effort that you think could lead to the development of a specific data package in the future, please contact us at the Secretariat.

Does the material entitity core make it possible to publish multi-item specimens (e.g. same catalogue number for a specimens skin, skeleton, bloodsample)?
Yes, you can have multiple physical items that correspond to the same organism. See also: Darwin Core Quick Reference Guide - Darwin Core

Can the material entity code be used in combination with DNA extension?
Yes, you could use the core with any extension.

Will the datasets based on frictionless packages also have the option to be downloaded as frictionless packages? Or only the current download options?
Yes on the IPT 3, you should be able to choose your download format: data package or Darwin Core Archive.

Can you explain how the converter works between DP and DwC-a?
There are different solutions depending on the data package. For example, for CameraTrapDP, we use camtraptor which pas developed by Peter Desmet. It reads the camperaTrap datasets and produces a Darwin Core archive. Currently, we have for schemas in implementation, each have their own converter, for the Catalogue of Life data package (colDP), we use checklistbank, for the other data packages, we don’t have a solution yet.

Can all the extension currently available on TEST IPT 2 be also available on the TEST IPT 3?
All the existing extensions should be available.

Can we map data that don’t follow the star schema in the IPT 3, for example, having a taxon file with an extension and an occurrence file with an extension as well?
In the current version of the IPT 3, there is only:

  • The Darwin Core Archive (e.g. star schema)
  • The Data packages developed (e.g. frictionless)

There isn’t anything in-between where we have a Darwin Core Archive with several “cores” (taxon and occurrence for example). Such schema would have to be specifically developed and implemented as a Data package.

Will be eDNA publishing tool ( https://edna-tool.gbif-uat.org) be integrated in the IPT 3?
At the moment, the tool is being tested. It could be integrated in the IPT 3 in the future but no decision has been made.

How does using the Catalogue of Life Data package to publish data differ from make a Checklist dataset in the IPT?
The Catalogue of Life data package (colDP) was developed as a data format for uploading and downloading data to and from https://www.checklistbank.org. In its simpler form, it is similar to Darwin Core Archive checklists. However, it offers a lot more flexibility and allows to supply parsed references for almost anything. You can find a comparison table of the two formats here and a general description of colDP here.
Note that currently, the IPT 3 converts colDP to Darwin Core Archives in order to share the data on GBIF.org. Apologies for the mistake, colDP archive aren’t ingested by GBIF at the moment (updated February 2024).

1 Like

Happy to read that IPT 3.0 has been officially released today!
It’s a great achievement for the Secretariat and a big relief for all those who were stuck with the Darwin Core star schema limitations.
Also good to see that Data Packages such as ColDP and CamtrapDP are now fully supported.
Backward compatibility is also essential for our community.
Back in October 2019, at Leiden, I had the pleasure to throw a first bridge between DarwinCore and Frictionless Data with a tiny python library called Frictionless DarwinCore.
I’m really delighted to see that it becomes a reality for the entire community in the Integrated Publishing Tool.
Congratulations to the IPT team

4 Likes

Does it means IPT 3.0 currently does not allow to publish full features CoLDP to checklistBank?

Very good question, the IPT 3.0 creates colDPs archives which you can then import in checklistbank.

The conversion to Darwin Core Archive happens on the GBIF side.

Just to clarify, colDP isn’t in the production version of the IPT3 and they aren’t registered on GBIF (my apologies for the earlier misinformation, I got confused).