Definitions for concepts or vocabulary related to absences

Last year the OBIS community noted that with the new model being developed by GBIF that it might not be possible to record absences explicitly. I posted here to seek feedback [Topic 3593]. A vibrant discussion took place and several of us have been meeting approximately monthly to try to define what we mean when we say “absence” and how those definitions relate to organism observations. We have finally settled on the four concepts and we would like to get your feedback on them.

True Absence

The definitive absence of any organisms of a specified scope (combinations of e.g. taxon, life stage, morph, etc.) within a specific area and time period. This is usually unknowable. At best, the probability of true absence can be inferred under certain conditions (typically from repeated inventory events) and knowledge or assumption of sampling effort. True absence is not a measured quantity and therefore is not primary data, but is instead inferred by downstream users of the data. Because true absence is a conclusion drawn from available information and cannot be measured, it should never be presented or treated (e.g., in a database) as equivalent information to non-detection or presence records.

Non-detection

Failure to detect organisms in the specified scope (e.g., combinations of taxonomic units, life stages, morphs, etc.) at a site. When an organism is detected, then true presence is known. However, when an organism within the scope is not detected we do not know whether: (1) the organism was truly absent or (2) the organism was present but merely not detected. The probability of detection will vary, for example with organism crypticity, sampling protocol, and sampling effort. Non-detection observations are sometimes imprecisely referred to as “absence” data.

Background points

Also sometimes referred to as pseudo-absence, background points are artificially created points generated to complement “presence-only” data (i.e., data when only detections of organisms are recorded as presence with no information about non-detections) for running models designed for presence-absence data (see Brotons et al. 2004, VanDerWal et al. 2009, Barbet-Massin et al. 2012, Valavi et al. 2021). Pseudo-absences are often used in species distribution modeling to provide a baseline for comparison with occurrence data. The methods used when generating a model should be available. Pseudo-absences generated should never be presented or treated (e.g., in a database) as equivalent information to non-detection or presence records.

Reporting completeness

Complete reporting means that the data recorded from a survey event include the complete list of all organisms that a survey detected, which are within the specified scope (e.g., combinations of taxonomic units, life stage, morph, etc.) that the survey was designed to detect. Reporting completeness has the following characteristics:

  • it is a logical variable (i.e., it can be either true or false)
  • it requires the scope to be specified, almost certainly in advance of the collection of data (e.g., a complete report from a survey designed to detect female individuals of all possible species of ichneumonid wasps — the scope in this example — contains no information about the presence or non-detection of juvenile ungulate mammals)
  • it has spatial and temporal scopes
  • it is a characteristic of two processes: (1) the design of the sampling used in a survey and (2) the subsequent recording of information collected during this survey

If completeness is not reported for a survey, then the absence of an organism could be due to any of three conditions: (1) the organism is truly absent, (2) the organism is truly present but was not detected, or 3) the organism is truly present and detected but not reported. Without reporting completeness, it is impossible to distinguish among these three possibilities. However if a survey has reporting completeness, it is not necessary for the data to contain records of zero detected individuals for the organism within the designated scopes. This is because the zero-count for an organism can be inferred from knowing the scope(s) of a survey, and the list of organism types that were reported as being detected. Established statistical methods exist for estimating the probability of occurrence when absence of reporting is only the result of either true absence or failure to detect an organism that is truly present (i.e., when there is reporting completeness).

2 Likes

@abbybenson, many thanks for a very sensible discussion. What do you think should be done with occurrenceStatus in Darwin Core? Two possibilities for a controlled vocabulary would be:

Allowed entries are “present” and “not recorded”. “absent” is not a valid entry.

Allowed entries are “present”, “not recorded” and “looked for, not found”. “absent” is not a valid entry.

I would favour the first, but the whole topic deserves a couple of pages in Darwin Core: the Missing Manual.

@datafixer, the likely answer to your question is that this issue will be addressed with an extension (Humboldt Extension) to Darwin Core in GBIF’s data model, which contains multiple fields for defining scope and reporting completeness. See: Humboldt Extension quick reference guide - Humboldt Extension for Ecological Inventories The Humboldt Extension is currently under public review in TDWG: Public review of Humboldt Extension to Darwin Core - TDWG

@abbybenson, I think that there is one sentence in the Background points section that could use a few additional words to make its meaning clearer. The current sentence is: “The methods used when generating a model should be available.” Perhaps change this to “The methods used to generating background points should be available as part of the description of creating a model.”
To me, the word “generating” feels more appropriate for describing the construction of the background data, while “creating” feels like a more general action that is related to construction of a model.

1 Like

@WHochachka, I’m not sure I understand. Are you suggesting that new sampling datasets will use the HE and omit occurrenceStatus (or leave it blank), and that in existing datasets occurrenceStatus = “absent” should be understood as “not recorded”?

I agree that the controlled vocabulary for occurrenceStatus would be better as something like “present” and “not recorded”.
The “looked for, not found” is something that has to do with the observation process and is perhaps less of an occurrence status?