Biotic Relationships - Are reciprocal relations needed?

This topic is related to the discussion on the topic on
Biotic Interactions Use Case.

This topic is meant to pose a challenge in pursuit of a broader discussion, “Are ‘reciprocal’ relationships needed, or is it always possible to provide just one side of a relationship and thereby understand the reciprocal one?”

Of course there is much more to the story of sharing biotic relationships. So much is dependent on what the vocabularies for the relationships are. And so much else is dependent on whether you can relate two things that you know a lot more about. For example, do both sides (the subject and object) have persistent resolvable globally-unique identifiers that we or machines can use to look up other things, such as the identification or sex?

The vocabularies part is particularly tricky. Suppose I have a parasitoid A in my collection. Suppose I have the host, B, too. I could relate the two by saying “A parasitoidOf B”. I could also say “B hostOf A”. Great, but it isn’t exactly reciprocal. “hostOf” has too many types of possibilities. It isn’t as specific as the relationship in the other direction. OK, how about “B parasitizedBy A”? Still not the same level of information, because there is more than one way to parasitize. Fine then, “B hadParasitoid A”. Almost good. Do we know that the original relationship that “parasitoidOf” always means that A was caught in the act of being a parasitoid? Or could it have been falsely accused by virtue of the accepted nefarious customs of its species? Also, do we know that a relationship “parasitoidOf” always refers to a relationship between two biological individuals? Or could it also refer to one biological individual on one side of the relationship and a taxon on the other side? Or taxa on both sides of the relationship?

Why does it matter? It matters because it makes for a huge challenge in actually finding data that fit the question being asked. This challenge suggests that, vocabularies, to be rigorously useful, would have to be a) vast, b) mapped to consistent reciprocals. The latter so that you could find what you are looking for no matter which sides of the relationship are provided.

All of this arose in reassessing the advice that was offered to participants in the Call for proposals to help mature and test how specimens are handled in GBIF’s emerging unified data model. That advice, given in the section on constructing EntityRelationships, was “We leave it to your discretion which relationships to capture from your original data, but be aware that the semantics are tied up entirely in the predicates, and care should be taken when developing these vocabulary terms.” As we move forward, is it reasonable to try to do better than that, or is it premature to worry about it?

4 Likes

Thanks for raising this issue, John!

I’ve come across this question when filling my data into GBIF’s new data model - voucher specimens, tissues and DNA samples which are in a clear parent-child relationship. For a “DNA extracted from” relation there is a definite reverse relation “DNA sample exists”. But since there is currently no way of stating that the latter relation is the reverse of the first, it’s probably useful to provide both. The same holds true for symmetric properties like “same specimen”, since only the human eye could identify that as symmetric.

In these cases, the relationship types can be defined more precisely as being each other’s reverse or being symmetrical than the examples given above. But until there are no vocabularies that define any semantics between relationsships, providing reverse relations is probably providing addtional information.

3 Likes

@tuco @j.holetschek Thanks for publicly sharing your thoughts on ways to describe interactions claims.

As I was reading through your posts, I was reminded of outcomes of discussions held related to the workshops ([1], [2], [3]) made possible in part by NSF through initiatives like https://idigbio.org , https://parasitetracker.org and https://globalbioticinteractions.org and facilitated by individuals including, but not limited to, Erika Tucker ORCID, Katja Seltmann ORCID, Kathryn Sullivan, Jennifer Zaspel ORCID , Jorrit Poelen ORCID, and Erika Krimmel ORCID. Much of the heavy lifting was done by workshop participants: they took time from their busy schedules to extract species interaction data from existing specimen and their labels as well as participating in group discussions that followed.

My impression of the workshop outcomes were that associations, or biotic interactions, are recorded and interpreted in many different ways, and care should be taken to record the verbatim (or original) description of the interactions, and, separately, link to one or more (subjective) interpretations of these interaction records.

So, to fully understand the meaning of a specific biotic interaction/relation claim, the origin (or provenance) of underlying data provides the context of who recorded the claim and what their supporting evidence was.

And, perhaps frustratingly, the interpretation (and subsequent recording) of an biotic association will vary depending on the context in which this interpretation came about. Similarly, the selection of what interpretation to use will likely dependent on the context in which the claims are (re-) used.

For instance, you might find a parasitologist describe a relationship of a ectoparasite to their host as “ex.” (aka from, or on) while implicitly assuming that folks should interpret the interaction as a “isEctoparasiteOf” relation.

Similarly, you might find researchers reduce specific interaction claims like “isPollinatorOf” to “visitsFlowerOf” or even “interactsWith” because they are aware that pollination claims are much harder to support than flower visitation observations.

So, if you’d ask me, I would favor a mechanism that helps to record biotic associations as records of opiniated claims of known origins (or provenance). I’d say it be up to the author of these claims to include reciprocal relations or perhaps even select multiple associations (e.g., ex, hasHost, ectoParasiteOf) to clarify the relationship between two organisms.

But, I am sure that if you ask someone else, you might get a different answer. And, I am curious to hear about those answers.

-jorrit

References

[1] Tucker, Erika, Poelen, Jorrit, & Seltmann, Katja. (2022, June 23). A lesson plan for better understanding entomological specimen interaction data in collections by NSF funded Terrestrial Parasite Tracker Thematic Collection Network. Zenodo. A lesson plan for better understanding entomological specimen interaction data in collections by NSF funded Terrestrial Parasite Tracker Thematic Collection Network. | Zenodo GitHub - globalbioticinteractions/ecm-workshop: Workshop pages for ECM Association Data Workshop Wed 22 June 2022

[2] Poelen, Jorrit H. (2022, June 22). On Interpreting Biotic Association Records. Zenodo. On Interpreting Biotic Association Records | Zenodo

[3] Seltmann & Poelen. 2021. A Practical Exploration of Biotic Interaction Data Management and Information Retrieval through Terrestrial Parasite Tracker (TPT) and Global Biotic Interactions (GloBI) [Workshop]. Zenodo. A Practical Exploration of Biotic Interaction Data Management and Information Retrieval through Terrestrial Parasite Tracker (TPT) and Global Biotic Interactions (GloBI) | Zenodo Terrestrial Parasite Tracker / Global Biotic Interactions: 28 April 2021

1 Like

That’s very interesting what you say about the verbatim original assertions with provenance. In the Unified, both that and any interpretations of that could be captured as separate assertions with provenance - the usual idea of “annotations” beyond the source.

@tuco Thanks for sharing.

What do you mean by “the Unified” ? Do you have a reference?

Also, I’d be curious see some explicit examples of how you’d write down the cases I shared in “the Unified”. . .

Sorry, the GBIF Unified Model.

There is a Biotic Interactions Use Case document with comments in it, but I do not see any examples you presented there. Can you point us to the examples? I can work up some tabular examples of the tables directly involved.

Thanks for sharing clarifying and pointing documentation re: the Unified.

As far as examples of use cases, you can find millions of examples of species interactions claims sourced from hundreds of data sources at data . Also, the workshop links I shared focus on specific aspects of this wealth of existing species interaction data and how they are transcribed and interpreted. Finally, you’ll find tons of examples of in Issues · globalbioticinteractions/globalbioticinteractions · GitHub as well as in the many blog posts at blog in which folks describe how they publish, and/or use species interaction data.

Unfortunately, at this point, I currently cannot commit to making additional, more specific, contributions to your project. In my experience, the development of these documents, examples and reference implementations require quite some attention, time, and effort. I hope the contributions I made so far (in the form of comments, email exchanges and references) will be somewhat helpful in shaping your narrative.

thx,
-jorrit

PS What is the status of the “GBIF Unified Model”? Is there a reference implementation? Is it intended to unify GBIFs models or all biodiversity data models? Do you expect others to adopt this model to describe biodiversity knowledge or it is merely a community consultation to help develop GBIF’s internal infrastructure?

@jhpoelen you can learn more about the initiative here: Data Model principal composition this page links to some recorded webinars as well as advertises for up coming ones. It also links to the case studies used in this work.

@mgrosjean Thanks for sharing a link to the overview page GBIF New Data Model to the GBIF Unified Model initiative [1,2].

I hope you can help me start to understand the wealth of information presented/linked on this page.

From the pages, I understood the following.

  1. GBIF Unified Model is a conceptual model that has been designed by @tuco John, Tim @trobertson and colleagues.
  2. GBIF Unified Model is primarily designed to help ingest diverse biodiversity data beyond DwC-A.
  3. GBIF Unified Model is GBIF internal product and is not (yet?) part of TDWG or any other standards body, but may help create new ways to publishing data that can translated into the GBIF Unified Model
  4. Use Cases help test the GBIF Unified Model and help to reach out to non-GBIF actors.
  5. So far, 23 Use Cases have been identified, and 3 have been presented.
  6. Use Case 2 “Camera Traps (Camtrap DP)” references “Camtrap DP
    Data exchange format for camera trap data” Camtrap DP and describes a way to map the Camtrap DP into the GBIF Unified Model.

With this, I am took some time to attempt to answer my earlier questions :

What is the status of the “GBIF Unified Model”?

It is under development. I was able to find a “basic” diagram of the GBIF Uniified Model in Appendix II: The Unified Model in [1], but am not quite sure whether other (versioned) specification of the model are available. Please do share if a more extensive description of the model is available somewhere, I probably missed it.

Is there a reference implementation?

Not sure, I wasn’t able to tell from the documentation. @tuco @trobertson can you help answer this question?

Is it intended to unify GBIFs models or all biodiversity data models?

As far as I understand, the purpose of the GBIF Unified Model is for GBIF to ingest biodiversity data beyond DwC-A.

Do you expect others to adopt this model to describe biodiversity knowledge or it is merely a community consultation to help develop GBIF’s internal infrastructure?

As far as I can tell, others are not expected to adopt/implement the GBIF Unified Model.

GBIF technical team will (or already has?) use the model as way to index more kinds of data in their infrastructure.

To help develop the GBIF Model and verify its use, use cases have been developed to verify whether the GBIF Unified Model is able to capture biodiversity data beyond those captured in DwC-A. And, if needed, new data exchange formats (e.g., Camera Trap Data Package) may be developed to facilitate the exchange of biodiversity data beyond DwC-A.

Curious to hear your thoughts on whether my notes align with the actual activities and purpose of the GBIF Unified Model initiative.

Thanks for helping to encourage discussion on ways to exchange biodiversity data,

-jorrit

Reference

[1] John Wieczorek, & Tim Robertson. (2023). Diversifying the GBIF Data Model hash://sha256/2ba382dee5eb3b7f86f93fa0e56d16e4897919afeb78907319e6af44824852d7 hash://md5/132bed5ac7dd5102f16dd78f3a57ab0c (0.1). Zenodo. Diversifying the GBIF Data Model hash://sha256/2ba382dee5eb3b7f86f93fa0e56d16e4897919afeb78907319e6af44824852d7 hash://md5/132bed5ac7dd5102f16dd78f3a57ab0c

[2] Robertson, Tim, Wieczorek, John, & Raymond, Mélianie. (2022). Diversifying the GBIF Data Model. Biodiversity Information Science and Standards, 6, e94420. Diversifying the GBIF Data Model

PS I took the liberty to version the main document [1] because I was unable find find a versioned copy. Please let me know if you’d like me to point to another (versioned) copy of this publication.

2 Likes

btw - @tuco If you’d like to do a little mapping exercise between GloBI model and the GBIF model, please feel free to open an issue at GitHub - globalbioticinteractions/globalbioticinteractions: Global Biotic Interactions provides access to existing species interaction datasets and outline your proposed integration workflow.

Thanks for all your work in facilitating these important integration, discovery, and re-use discussions.

1 Like

With much anticipation, I opened a new issue explore GloBI mapping to the Unified model.

@jhpoelen I’m happy to try to answer your questions. These are from my perspective working on the project. All opinions are mine and may or may not represent the views of GBIF.

From the pages, I understood the following.

  1. GBIF Unified Model is a conceptual model that has been designed by @tuco John, Tim @trobertson >and colleagues.
  2. GBIF Unified Model is primarily designed to help ingest diverse biodiversity data beyond DwC-A.
  3. GBIF Unified Model is GBIF internal product and is not (yet?) part of TDWG or any other standards body, but may help create new ways to publishing data that can translated into the GBIF Unified Model
  4. Use Cases help test the GBIF Unified Model and help to reach out to non-GBIF actors.
  5. So far, 23 Use Cases have been identified, and 3 have been presented.
  6. Use Case 2 “Camera Traps (Camtrap DP)” references “Camtrap DP
    Data exchange format for camera trap data” Camtrap DP and >describes a way to map the Camtrap DP into the GBIF Unified Model.

The following responses refer to the 6 observations above:

  1. Correct, with the help of everyone helping to develop and review use cases.
  2. It would be easy to say, “Yes”, but I fear that might be misleading. I would say the Unified Model is primarily to support new usage capabilities (richer data, views from distinct data perspectives, more questions that can be answered) in response to the GBIF 20-year review. I would say that the Unified Model is not actually designed to help ingest. That is, it isn’t meant as a model for data publishers to follow necessarily, because it is fairly complex and will certainly differ in many respects to natively stored data from distinct sources. It will probably also differ from the final native implementation GBIF develops. To help ingest, GBIF expects to support multiple new and existing publishing models tailored to families of use cases with the hope of re-using data sharing standards developed by specialized communities whenever those are viable.
  3. Correct. GBIF is approaching the work by field-testing real-world data in an attempt to find a comprehensive model that minimizes complications and redundancy while supporting the richness and depth inherent in the use cases developed through interested communities. A stable production implementation, or parts of it, plus any publishing models that prove successful, might well serve to enhance or at least inform existing standards, including those of TDWG.
  4. Correct. The idea is to provide solutions to existing data publishing and data integration challenges that help existing data publishers to more faithfully and fully share their data and that help would-be data publishers to contribute by providing solutions for new kinds of data that are not currently supported.
  5. Correct. There are 23 currently listed use cases. Each of these is meant to present at least one data sharing challenge that hasn’t been met by existing paradigms or by other use cases on the list. Sometimes the use cases cover multiple example data sets with slight nuances. The three “presented” use cases were shared in international forums with wide audiences. Many of the other use cases have been presented for study and feedback in the groups working on them.
  6. Correct. This is a very mature use case demonstrating the re-use of a standard developed within a specialized community, to the point where developments to the Integrated Publishing Toolkit have demonstrated the viability of allowing Camtrap DP Frictionless Data Packages to be published natively. Work on this use case served both to develop the Unified Model and to refine Camtrap DP.

What is the status of the “GBIF Unified Model”?

It is under development. I was able to find a “basic” diagram of the GBIF Uniified Model in Appendix II: The Unified Model in [1], but am not quite sure whether other (versioned) specification of the model are available. Please do share if a more extensive description of the model is available somewhere, I probably missed it.

Yes, the diagram in Appendix II is a minimalist view of the latest version of the Unified Model. Older versions can be seen by tracking back through the document history (see the “Previous version of this document” link in the References section of any version of the document “Diversifying the GBIF Data Model” [1]. Having said that, no version is suitable for anything more than experimentation. With each new use case, we find refinements that can be made, and do not see the Unified Model as stable yet.

Is there a reference implementation?

Not sure, I wasn’t able to tell from the documentation. @tuco @trobertson can you help answer this question?

Speaking of experimentation, for those interested, there is a reference PostgreSQL database schema [3] matching the version of the Unified Model [1] in support of the “Call for proposals to help mature and test how specimens are handled in GBIF’s emerging unified data model” [4], with a recommended approach [5] to map data to the target version of the Unified Model.

Is it intended to unify GBIFs models or all biodiversity data models?

As far as I understand, the purpose of the GBIF Unified Model is for GBIF to ingest biodiversity data beyond DwC-A.

I hope the answers in 1-6 above address this question.

Do you expect others to adopt this model to describe biodiversity knowledge or it is merely a community consultation to help develop GBIF’s internal infrastructure?

As far as I can tell, others are not expected to adopt/implement the GBIF Unified Model.

I would say that GBIF is trying to find solutions that will help them and others to solve existing data sharing and aggregation issues. GBIF is working with others to accomplish this collaboratively. As seen already with Camtrap DP, Event-based data publishing in ALA, site-species matrices with OBIS, eBird and others, the benefits apply all around. The “Diversifying the GBIF Data Model” project has a scope that includes any community that wants to share biodiversity data, most others in collaborations have some part of that scope. In these respects I would say GBIF has no expectations of others, but is keen to know what is needed, is keen to make sure that all of the hard work has the broadest possible application and impact, that it be useful anywhere it makes sense to adopt or implement any part of it.

GBIF technical team will (or already has?) use the model as way to index more kinds of data in their infrastructure.

The expectation is that some future stable version of the Unified Model will inform a GBIF implementation that supports the views and aggregated data access that the use cases were meant to highlight. There have only been stand-alone experimental implementations thus far, similar to the PostgreSQL schema mentioned above [3].

To help develop the GBIF Model and verify its use, use cases have been developed to verify whether the GBIF Unified Model is able to capture biodiversity data beyond those captured in DwC-A. And, if needed, new data exchange formats (e.g., Camera Trap Data Package) may be developed to facilitate the exchange of biodiversity data beyond DwC-A.

Correct.

Curious to hear your thoughts on whether my notes align with the actual activities and purpose of the GBIF Unified Model initiative.

Thanks for helping to encourage discussion on ways to exchange biodiversity data,

I hope this has helped to clarify things.

John

Reference

[1] John Wieczorek, & Tim Robertson. (2023). Diversifying the GBIF Data Model hash://sha256/2ba382dee5eb3b7f86f93fa0e56d16e4897919afeb78907319e6af44824852d7 hash://md5/132bed5ac7dd5102f16dd78f3a57ab0c (0.1). Zenodo. Diversifying the GBIF Data Model hash://sha256/2ba382dee5eb3b7f86f93fa0e56d16e4897919afeb78907319e6af44824852d7 hash://md5/132bed5ac7dd5102f16dd78f3a57ab0c | Zenodo

[2] Robertson, Tim, Wieczorek, John, & Raymond, Mélianie. (2022). Diversifying the GBIF Data Model. Biodiversity Information Science and Standards, 6, e94420. Diversifying the GBIF Data Model

[3] https://github.com/gbif/model-material/blob/master/schema.sql

[4] Call for proposals to help mature and test how specimens are handled in GBIF’s emerging unified data model

[5] https://github.com/gbif/model-material/blob/master/data-mapping.md

1 Like

@tuco Thanks for taking the time to reply and for expressing your desire in the GloBI issues tracker .

Please see notes on my first mapping attempt below. Am curious to hear your thoughts. See also Mapping Exercise - GloBI to GBIF Unified Model · Issue #870 · globalbioticinteractions/globalbioticinteractions · GitHub .


@tucotuco I took at little time to review the material you shared.

Some observations:

  1. Time Investment - to do a full mapping from GloBI > “GBIF Unified Model” would take significant amount of time, I’d say at least about 40 hours to build a prototype for a translation from GloBI land to GBIF land. And this is assuming that the GBIF engineering team and/or their consultants are available to provide feedback and integrate the produced data into the “GBIF Unified Model.” I am assuming I am not the only one to realize the effort it takes to do this mapping exercise. How have others been able to free up this chunk of time to take on this mapping exercise? How do you imagine specialized projects running a tight ship, to engage, participate, and contribute to your efforts?
  2. Suspected Gaps in Provenance Modelling - one of the core design ideas behind GloBI is to establish a provenance of digital records. This came into being from the desire to be able to systematically trace the resulting index record (e.g., a row in one of your database tables) to the original dataset / process that it was sourced from. Or, phrased in a question: how are you keeping track of the specific source data and their transformation processes that lead up to data records showing up in the GBIF index? And, how are you keeping track of the various versions of the indexed datasets and their associated indexes? How do you cite a specific version of a record such that it can be retrieved at some time (months, years) in the future?

Thanks @jhpoelen

How have others been able to free up this chunk of time to take on this mapping exercise? How do you imagine specialized projects running a tight ship, to engage, participate, and contribute to your efforts?

It’s very helpful to hear an estimate of the effort it would take, which looks similar to my own experience with the data I’ve worked on.

The focus at the moment is on the specimen data aspects, where we’re working with the group who applied for the small funded contracts. There are a few participating without funds as it’s in their interest (e.g. some GBIF Nodes).

As this is the first group we’ve run we’re learning a lot, but we anticipate there will be more open-funded calls in the future that focus on different aspects of the model. We’ll make sure you are aware of those when they are announced if you wish to apply for them. In the meantime you are of course welcome to explore and contribute if it is of interest.

To your second question about provenance - the processing pipelines are not really the focus of our work at the moment, but rather we are focused on the shape of the overall data model.

@trobertson thanks for sharing that you are open to compensating others to review your data model.

To your second question about provenance - the processing pipelines are not really the focus of our work at the moment, but rather we are focused on the shape of the overall data model.

I’d say that a data model without explicit provenance is like having a scientific paper without references. To me, the origin of indexed (or modeled) data defines the context in which the data came about. And without that context, I do not have access to the chain of evidence that led up to the derived (or processed) content.

Looking forward to making sure that the model accommodates the kinds of provenance concerns raised here.