Webinar 2: Entity relationships and attributes (Francesca Jaroszynska)

The following question(s) were asked in the Collection Management Systems Webinar and will be answered here.

Francesca Jaroszynska: how can the entityRelationship table handle partial information when the entity observation is a collection of individuals of a given taxon? Could nesting observations in the Entity table be more efficient? For example, when information on lifeStage, sex or ID tags are available for only some of the individuals observed.

Response:
This is an interesting question whose solution could also apply to lots in collections where individuals are not identified or tracked separately, but subsets of which do have distinct attributes. An example is the best way to demonstrate one way to deal with this.

Let the scenario be a monitoring Event targeting a flock of pigeons at a particular city site. One of the pigeons has a band and so can be identified as a specific individual with known sex and life stage. The goal is to do as well as possible characterizing the population structure in terms of sex and life stage.

First we need the Event:
eventID: event3
eventType: dwc:HumanObservation
locationID: pigeon_site1
eventDate: 2022-07-17
habitat: city

The Entities that can be instantiated are the pigeon population “PigeonPopulation1”, and the marked pigeon “Pigeon1”, which is a member of that population. Both are dwc:Organisms, though one has a dwc:organismScope of “population” and the other has a dwc:organismScope of an “individual”.
entityID: PigeonPopulation1
entityType: dwc:Organism
entityID: Pigeon1
entityType: dwc:Organism

To capture the scope of the Organisms we can use EntityAssertions:
entityAssertionID: ea6
entityID: PigeonPopulation1
entityAssertionType: dwc:organismScope
entityAssertionValue: population

entityAssertionID: ea7
entityID: Pigeon1
entityAssertionType: dwc:organismScope
entityAssertionValue: individual

In order to connect the Entities with the monitoring event in which they were observed, we need EntityEvents, one for the population:
entityID: PigeonPopulation1
eventID: event3

and one for the marked individual:
entityID: Pigeon1
eventID: event3

We need to show that “Pigeon1” was a part of “PigeonPopulation1” on the date observed. We’ll do that with an EntityRelationship:
entityRelationshipID: er_id16
subjectEntityID: Pigeon1
entityRelationshipType: member of
objectEntityID: PigeonPopulation1
entityRelationshipDate: 2022-07-17

The property entityRelationshipDate is not yet in the Unified Model yet, but this mini use case highlights the need for it. The complementary EntityRelationship is:
entityRelationshipID: er_id17
subjectEntityID: PigeonPopulation1
entityRelationshipType: has member
objectEntityID: Pigeon1
entityRelationshipDate: 2022-07-17

Now we can model the attributes of “Pigeon1” with EntityAssertions. Let’s say the marked pigeon is an adult female:
entityAssertionID: ea8
entityID: Pigeon1
entityAssertionType: dwc:lifeStage
entityAssertionValue: adult
entityAssertionDate: 2022-07-17

entityAssertionID: ea9
entityID: Pigeon1
entityAssertionType: dwc:sex
entityAssertionValue: female
entityAssertionDate: 2022-07-17

Now we can model the attributes of “PigeonPopulation1”, also with EntityAssertions. The flock had 13 individuals on the day they were observed, including the banded individual:
entityAssertionID: ea10
entityID: PigeonPopulation1
entityAssertionType: dwc:organismQuantity
entityAssertionValueNumeric: 13
entityAssertionUnit: individuals
entityAssertionDate: 2022-07-17

It was easy enough to distinguish the juveniles from the adults:
entityAssertionID: ea11
entityID: PigeonPopulation1
entityAssertionType: juvenile count
entityAssertionValueNumeric: 6
entityAssertionUnit: individuals
entityAssertionDate: 2022-07-17

entityAssertionID: ea12
entityID: PigeonPopulation1
entityAssertionType: adult count
entityAssertionValueNumeric: 7
entityAssertionUnit: individuals
entityAssertionDate: 2022-07-17

But the sex of the adults could only be divined by their behavior, which 4 of the unmarked adult population exhibited:
entityAssertionID: ea13
entityID: PigeonPopulation1
entityAssertionType: minimum adult male count
entityAssertionValueNumeric: 2
entityAssertionUnit: individuals
entityAssertionDate: 2022-07-17
entityAssertionRemark: determined by behavior

entityAssertionID: ea14
entityID: PigeonPopulation1
entityAssertionType: minimum adult female count
entityAssertionValueNumeric: 3
entityAssertionUnit: individuals
entityAssertionDate: 2022-07-17
entityAssertionRemark: determined by behavior for two individuals, the third was a marked individual of confirmed sex

Hi John,

I have difficulties with the way you handle the issue, but I think there are 2 topics: the way to handle relationship between entities, and the use of entityAssertionType vocabulary.

Let start by the second, with your exemple

entityAssertionID: ea14
entityID: PigeonPopulation1
entityAssertionType: minimum adult female count
entityAssertionValueNumeric: 3
entityAssertionUnit: individuals

Here, you tell us that you sax at least 2 adult female within one single assertionType. I am afraid here to see an enlargement of the assertionType vocabulary (@abentley topic), either by flipping words or by adding new elements (minimum white adult female count ?).

I would advocate to separate the assertion, in the same way you did for the sigle pigeon
sex: female
life stage: adult
organism quantity: 3
However, doing this means that the described entity is not anymore PigeonPopulation1, but a subpart of it.

entityAssertionID: ea10
entityID: PigeonPopulation1

entityAssertionType: dwc:organismQuantity
entityAssertionValueNumeric: 13
entityAssertionUnit: individuals

entityAssertionType: juvenile count
entityAssertionValueNumeric: 6
entityAssertionUnit: individuals

entityAssertionType: minimum adult male count
entityAssertionValueNumeric: 2
entityAssertionUnit: individuals

entityAssertionType: minimum adult female count
entityAssertionValueNumeric: 3
entityAssertionUnit: individuals

Here, you tell us that on PigeonPopulation1, you saw :
13 individuals,
6 juveniles,
7 adults,
at least 2 adult males,
at least 3 adult females.

From the structure of the data, I am not sure how many individual you saw as they all describe PigeonPopulation1

  • 13 (I guess it was your value)
  • 26 = 13 undetermined + 6 juveniles + 7 adults (including 2 males and 3 females)
  • 31 = 13 undetermined + 6 juveniles sex undetermined + 7 adults sex undetermined + 2 adult males + 3 adult females

Here, I would advocate to identify to kind of entities: the flock itself, of 13 individuals and likely other assertion specific to the folk (area covered, speed and direction…) ; and subparts of the flock, therefore as new entities related to the flock. It will help to keep the vocabulary as controlled as possible while being clear on the components. This advocates again for adding new entities.

This bring us to the first topic:

By increasing the number of entities, we increase the numbers of entity relationships. Those relationships “member of/ has member” or any kind of “parent/child” are not of the most interest for biological purposes, as they are here only to indicate a database hierarchical relation. They are, in addition, quite heavy to fill in both from scripts or hand.

If we had a parentEntityID field, we could manage that more easily. Interestingly, @DavidFichtmueller used a diagram including this parentEntityID on April 20 (topic)

entityID: PigeonPopulation1
entityType: dwc:Organism
entityAssertionType: dwc:organismQuantity
entityAssertionValue: 13

entityID: Pigeon1
parentEntityID: PigeonPopulation1
entityType: dwc:Organism*
entityAssertionType: dwc:sex
entityAssertionValue: female
entityAssertionType: dwc:lifeStage
entityAssertionValue: adult

entityID: PigeonPopulatoin1_1
parentEntityID: PigeonPopulation1
entityType: dwc:Population
entityAssertionType: dwc:organismQuantity
entityAssertionValue: 6
entityAssertionType: dwc:lifeStage
entityAssertionValue: juvenile

entityID: PigeonPopulatoin1_2
parentEntityID: PigeonPopulation1
entityType: dwc:Population
entityAssertionType: dwc:organismQuantity
entityAssertionValue: 7
entityAssertionType: dwc:lifeStage
entityAssertionValue: adult

entityID: PigeonPopulatoin1_2_1
parentEntityID: PigeonPopulation1_2
entityType: dwc:Population
entityAssertionType: dwc:organismQuantity
entityAssertionValue: 2
entityAssertionType: dwc:sex
entityAssertionValue: male

entityID: PigeonPopulatoin1_2_2
parentEntityID: PigeonPopulation1_2
entityType: dwc:Population
entityAssertionType: dwc:organismQuantity
entityAssertionValue: 3
entityAssertionType: dwc:sex
entityAssertionValue: female

This way would also be technically more easy to keep the original large observation: the flock of 13 individuals (i.e. no parentEntityID), and allows to clean the entityRelation table from the least relevant information.

The addition of “minimal” could be handle as a estimated value (discussion):
assertionID: x
parentassertionID: cf assertion ID of the “organismQuantity: 2 individuals”
assertionType: minimal
assertionValueNumeric: 2
assertionUnit: individuals

Wouldn’t that be a nice improvement, and perfectly in line with the parentEventID, parentAssertionID, parentTaxonID, and every dependsOn elements ?