A major use, if not the major use, of GBIF occurrence data is the building of species distribution models (SDMs). These SDMs are overwhelmingly the correlative kind. They correlate gridded climate variables with gridded presence data, with or without pseudo-absences.
In this post I outline a skeptical view of SDMing and the thousands of research papers based on it. Comments are welcome, but please look first at the references listed at the end of this post. There is a large literature on SDM methodology, and I’ve selected some recent critical evaluations.
To be fair, the statistical methods used to generate SDMs have improved greatly since ecological niche modelling (ENM) became popular in the 1990s. The interpretation of the results, however, hasn’t changed. SDMs do not predict where a species might or might not be found (binary prediction). They don’t even predict the probability of a species’ occurring in a particular grid cell. No knowledgeable ecologist or biogeographer could possibly claim that SDM does these things, because species occurrences are determined by many factors other than climatic variables, and these factors are ignored in climate-based models. All the modelling generates is a spatial correlation.
Like any statistical correlation, a SDM is not a hypothesis and it cannot be tested. What people do with correlations is think up causal hypotheses that might explain the correlation. They can then decide between the competing hypotheses, or throw all of them out, but to do that testing they need new and different evidence.
Please note that I’m not arguing that spatial correlations between occurrences and climate variables are spurious. I’m pointing out that they’re incomplete explanations, and how incomplete they are is largely unknown.
In the 1990s, ENM maps were described as “climatic envelopes”. Their interpretation in plain language was “the species prefers to live in places where the climate parameters have these values”. Even back then, some interpretations went further and said “this map shows the predicted range of the species”, which was clearly nonsense in the 1990s and is still nonsense today. You cannot even say “this map shows the climate parameter values which constrain the distribution of the species”, because non-climate factors may be important constraints on the distribution, and if those other constraints weren’t operating, the species might be able to persist in places with very different climate parameter values.
You cannot test a SDM because there is nothing to test, but you can validate it by using alternative statistical methods to generate independently derived spatial correlations. If you do this and find that your particular SDM has results very similar to the other correlative maps, then you have validated your model — i.e., it’s no worse than the others.
So why is SDM-ing is so popular, and why is SDM interpretation so often wrong?
I can answer the first question as “ease of doing”. Download a mass of GBIF occurrence records for taxon A. Filter out any records that look dubious for that taxon. Filter out any records without coordinates or with doubtful georeferencing. Filter out (if necessary) any records before a threshold date. Now discard all fields in the remaining records except latitude and longitude. Reduce spatial autocorrelation by spatially “thinning” the dataset to 1 record per relevant grid cell. Feed the result into your SDM software pipeline and paste the resulting maps into your draft SDM paper. Time required: 1-2 days. Time required for fieldwork: 0 days. Time required to investigate non-climate constraints on A’s distribution: 0 days.
Many SDM studies are “predictive”: given these future climate grids, where will a species be found in 20 years? 50 years? This is extraordinary nonsense, but no more extraordinary than professional economists predicting what interest rates will be like in 12 months’ time. The economists won’t lose their jobs if they’re wrong, and the same is true for publishers of predictive SDMs.
The second question is harder to answer, but I suspect it’s simple ignorance. Everyone else seems to be building SDMs and publishing the results, so the prospective SDMer just copies the methods and the interpretations seen in published SDM studies. Expect to see many more copycat SDM papers in future.
Robert Mesibov (“datafixer”); mesibov@datafix.com.au
MartĂnez-Minaya, J., Cameletti, M., Conesa, D. et al. Species distribution modeling: a statistical review with focus in spatio-temporal issues. Stoch Environ Res Risk Assess 32, 3227–3244 (2018) DOI
Gardner AS, Maclean IMD, Gaston KJ. Climatic predictors of species distributions neglect biophysiologically meaningful variables. Divers Distrib. 2019; 25: 1318–1333. DOI
AraĂşjo, Miguel B. et al. Standards for distribution models in biodiversity assessments. 2019. Science Advances eaat4858 5(1) DOI
Rocchini, D., Tordoni, E., Marchetto, E. et al. A quixotic view of spatial bias in modelling the distribution of species and their diversity. npj biodivers 2, 10 (2023) DOI
Frans, V.F., Liu, J. Gaps and opportunities in modelling human influence on species distributions in the Anthropocene. Nat Ecol Evol 8, 1365–1377 (2024) DOI
Tsiftsis, S., Ĺ tĂpková, Z., Rejmánek, M. et al. Predictions of species distributions based only on models estimating future climate change are not reliable. Sci Rep 14, 25778 (2024) DOI