Data models and standards for improved usability

A key aspect is that the primary mechanisms for collecting useful data on biodiversity have changed over time. I see three overlapping eras:

  1. SPECIMENS - Prior to the middle of the 20th century, the vast majority of information we have on biodiversity comes from the work of collectors. A small workforce delivered data collected primarily from the most accessible locations (no sampling methodology) but with very broad taxonomic scope. The model could never scale to deliver planetary modelling but gives us our earliest useful data.
  2. HUMAN OBSERVATIONS - From the 20th century onwards, the vast bulk of our data is from field observations either by professional scientists (ecologists, etc.) or volunteer efforts (bird atlases, bird banding/ringing, citizen science, etc.). The taxonomic scope is often narrower than with specimen collection, but (for the taxa that can be recorded by amateur naturalists) data volumes can be very large (though often still with insufficient thought given to sampling methodology).
  3. MACHINE OBSERVATIONS - We are near the beginning of a third era, in which the simplest, most cost-effective and scalable way to collect biodiversity data will be through machine solutions: eDNA, AI processing of webcam, UAV and satellite images or of acoustic recordings, etc. Such methods are much more amenable to broad-scale sampling approaches and can (at least with eDNA) cover most organism groups.

The coverage and quality profiles of these three categories (and of the associated recording eras) are fundamentally different. Successful integration will require us to find ways to cross-calibrate these diverse signals.