Absences and how they fit in the new model

To capture the nuances of the different “not found” situations, we could extend the occurrenceStatus vocabulary with additional terms that provide more context. Something like:

  1. occurrenceStatus: A vocabulary consisting of “present”, “absent”, and “not found”.
  2. absenceConfidence: A vocabulary representing the confidence in the absence of the organism, with terms like “high”, “medium”, and “low”.

For each example given by @datafixer :

(1) You looked for elephants by day in a 2ha patch of African savanna. You didn’t see any.

  • occurrenceStatus: “not found”
  • absenceConfidence: “high”

(2) You searched for an uncommon plant or animal on a sampling plot. Your search might have turned one up after many hours of looking, but you only had one hour on the sample plot, and you didn’t find the target.

  • occurrenceStatus: “not found”
  • absenceConfidence: “medium”

(3) You searched for a plant or animal that’s small and cryptic, or seasonal, or condition-dependent. You looked very carefully and didn’t find any. You are far from confident that “not found” here means “absent”, because the target could be well-hidden, at the wrong life stage for an ID, or waiting for the next rain to emerge.

  • occurrenceStatus: “not found”
  • absenceConfidence: “low”

This data model allows you to capture the nuances of different “not found” situations by combining the occurrenceStatus and absenceConfidence terms. The absenceConfidence term provides additional information on the level of confidence in determining the absence of the target organism.

Alternatively, for something a bit more quantitative:

Here’s a suggested data model with a focus on sampling effort:

  1. occurrenceStatus: A vocabulary consisting of “present”, “absent”, and “not found”.
  2. samplingEffort: A quantitative measure of the effort put into searching for the organism. This could be represented in different units depending on the context, such as person-hours, area covered, or number of samples.
  3. samplingEffortUnits: A vocabulary representing the units used for the samplingEffort measurement, such as “person-hours”, or perhaps something like “square meters”, or “number of samples”.

For each example:

(1) You looked for elephants by day in a 2ha patch of African savanna. You didn’t see any.

  • occurrenceStatus: “not found”
  • samplingEffort: 2 (assuming you spent 2 person-hours searching)
  • samplingEffortUnits: “person-hours”

(2) You searched for an uncommon plant or animal on a sampling plot. Your search might have turned one up after many hours of looking, but you only had one hour on the sample plot, and you didn’t find the target.

  • occurrenceStatus: “not found”
  • samplingEffort: 1 (assuming you spent 1 person-hour searching)
  • samplingEffortUnits: “person-hours”

(3) You searched for a plant or animal that’s small and cryptic, or seasonal, or condition-dependent. You looked very carefully and didn’t find any. You are far from confident that “not found” here means “absent”, because the target could be well-hidden, at the wrong life stage for an ID, or waiting for the next rain to emerge.

  • occurrenceStatus: “not found”
  • samplingEffort: 3 (assuming you spent 3 person-hours searching)
  • samplingEffortUnits: “person-hours”

This data model allows you to quantify the sampling effort put into searching for the organism, providing a more objective way of representing the confidence in determining the organism’s presence or absence. The choices of allowed units for the samplingEffort measurement would need some thought.

I should mention that these suggestions are from gpt4, after feeding it comments from this thread and asking it for a solution.