@jhpoelen I’m happy to try to answer your questions. These are from my perspective working on the project. All opinions are mine and may or may not represent the views of GBIF.
From the pages, I understood the following.
- GBIF Unified Model is a conceptual model that has been designed by @tuco John, Tim @trobertson >and colleagues.
- GBIF Unified Model is primarily designed to help ingest diverse biodiversity data beyond DwC-A.
- GBIF Unified Model is GBIF internal product and is not (yet?) part of TDWG or any other standards body, but may help create new ways to publishing data that can translated into the GBIF Unified Model
- Use Cases help test the GBIF Unified Model and help to reach out to non-GBIF actors.
- So far, 23 Use Cases have been identified, and 3 have been presented.
- Use Case 2 “Camera Traps (Camtrap DP)” references “Camtrap DP
Data exchange format for camera trap data” Camtrap DP and >describes a way to map the Camtrap DP into the GBIF Unified Model.
The following responses refer to the 6 observations above:
- Correct, with the help of everyone helping to develop and review use cases.
- It would be easy to say, “Yes”, but I fear that might be misleading. I would say the Unified Model is primarily to support new usage capabilities (richer data, views from distinct data perspectives, more questions that can be answered) in response to the GBIF 20-year review. I would say that the Unified Model is not actually designed to help ingest. That is, it isn’t meant as a model for data publishers to follow necessarily, because it is fairly complex and will certainly differ in many respects to natively stored data from distinct sources. It will probably also differ from the final native implementation GBIF develops. To help ingest, GBIF expects to support multiple new and existing publishing models tailored to families of use cases with the hope of re-using data sharing standards developed by specialized communities whenever those are viable.
- Correct. GBIF is approaching the work by field-testing real-world data in an attempt to find a comprehensive model that minimizes complications and redundancy while supporting the richness and depth inherent in the use cases developed through interested communities. A stable production implementation, or parts of it, plus any publishing models that prove successful, might well serve to enhance or at least inform existing standards, including those of TDWG.
- Correct. The idea is to provide solutions to existing data publishing and data integration challenges that help existing data publishers to more faithfully and fully share their data and that help would-be data publishers to contribute by providing solutions for new kinds of data that are not currently supported.
- Correct. There are 23 currently listed use cases. Each of these is meant to present at least one data sharing challenge that hasn’t been met by existing paradigms or by other use cases on the list. Sometimes the use cases cover multiple example data sets with slight nuances. The three “presented” use cases were shared in international forums with wide audiences. Many of the other use cases have been presented for study and feedback in the groups working on them.
- Correct. This is a very mature use case demonstrating the re-use of a standard developed within a specialized community, to the point where developments to the Integrated Publishing Toolkit have demonstrated the viability of allowing Camtrap DP Frictionless Data Packages to be published natively. Work on this use case served both to develop the Unified Model and to refine Camtrap DP.
What is the status of the “GBIF Unified Model”?
It is under development. I was able to find a “basic” diagram of the GBIF Uniified Model in Appendix II: The Unified Model in [1], but am not quite sure whether other (versioned) specification of the model are available. Please do share if a more extensive description of the model is available somewhere, I probably missed it.
Yes, the diagram in Appendix II is a minimalist view of the latest version of the Unified Model. Older versions can be seen by tracking back through the document history (see the “Previous version of this document” link in the References section of any version of the document “Diversifying the GBIF Data Model” [1]. Having said that, no version is suitable for anything more than experimentation. With each new use case, we find refinements that can be made, and do not see the Unified Model as stable yet.
Is there a reference implementation?
Not sure, I wasn’t able to tell from the documentation. @tuco @trobertson can you help answer this question?
Speaking of experimentation, for those interested, there is a reference PostgreSQL database schema [3] matching the version of the Unified Model [1] in support of the “Call for proposals to help mature and test how specimens are handled in GBIF’s emerging unified data model” [4], with a recommended approach [5] to map data to the target version of the Unified Model.
Is it intended to unify GBIFs models or all biodiversity data models?
As far as I understand, the purpose of the GBIF Unified Model is for GBIF to ingest biodiversity data beyond DwC-A.
I hope the answers in 1-6 above address this question.
Do you expect others to adopt this model to describe biodiversity knowledge or it is merely a community consultation to help develop GBIF’s internal infrastructure?
As far as I can tell, others are not expected to adopt/implement the GBIF Unified Model.
I would say that GBIF is trying to find solutions that will help them and others to solve existing data sharing and aggregation issues. GBIF is working with others to accomplish this collaboratively. As seen already with Camtrap DP, Event-based data publishing in ALA, site-species matrices with OBIS, eBird and others, the benefits apply all around. The “Diversifying the GBIF Data Model” project has a scope that includes any community that wants to share biodiversity data, most others in collaborations have some part of that scope. In these respects I would say GBIF has no expectations of others, but is keen to know what is needed, is keen to make sure that all of the hard work has the broadest possible application and impact, that it be useful anywhere it makes sense to adopt or implement any part of it.
GBIF technical team will (or already has?) use the model as way to index more kinds of data in their infrastructure.
The expectation is that some future stable version of the Unified Model will inform a GBIF implementation that supports the views and aggregated data access that the use cases were meant to highlight. There have only been stand-alone experimental implementations thus far, similar to the PostgreSQL schema mentioned above [3].
To help develop the GBIF Model and verify its use, use cases have been developed to verify whether the GBIF Unified Model is able to capture biodiversity data beyond those captured in DwC-A. And, if needed, new data exchange formats (e.g., Camera Trap Data Package) may be developed to facilitate the exchange of biodiversity data beyond DwC-A.
Correct.
Curious to hear your thoughts on whether my notes align with the actual activities and purpose of the GBIF Unified Model initiative.
Thanks for helping to encourage discussion on ways to exchange biodiversity data,
I hope this has helped to clarify things.
John
Reference
[1] John Wieczorek, & Tim Robertson. (2023). Diversifying the GBIF Data Model hash://sha256/2ba382dee5eb3b7f86f93fa0e56d16e4897919afeb78907319e6af44824852d7 hash://md5/132bed5ac7dd5102f16dd78f3a57ab0c (0.1). Zenodo. Diversifying the GBIF Data Model hash://sha256/2ba382dee5eb3b7f86f93fa0e56d16e4897919afeb78907319e6af44824852d7 hash://md5/132bed5ac7dd5102f16dd78f3a57ab0c | Zenodo
[2] Robertson, Tim, Wieczorek, John, & Raymond, Mélianie. (2022). Diversifying the GBIF Data Model. Biodiversity Information Science and Standards, 6, e94420. Diversifying the GBIF Data Model
[3] https://github.com/gbif/model-material/blob/master/schema.sql
[4] Call for proposals to help mature and test how specimens are handled in GBIF’s emerging unified data model
[5] https://github.com/gbif/model-material/blob/master/data-mapping.md