Why don't African countries feature in the top countries using data from GBIF?


The release of our bimonthly update to GBIF’s summary slide deck (see also #data-use topic 31) prompted three questions from Faustin Gashakamba of ARCOS:

  • Why does the USA stay at the top of the list of users of the website?
  • Have we conducted surveys to know who’s accessing the website and what they are doing with the data they are downloading?

These are all interesting questions, and the answers—or, especially for the first two, the factors that contribute to something like answers—are more nuanced than one might expect.

First, for anyone who doesn’t know, the GBIF Secretariat does maintain a range of global and national statistics for web traffic, data publication, downloads and research publications—each of the categories that appear in these top ten lists in the slides. Many of these appear in our annual country reports (2017 versions of which are coming soon).

Before I asked to move our conversation to this new forum, Scott Miller from the Smithsonian hypothesized that we might be ‘capturing the country of the internet service provider, not the country of the ultimate customer’, noting that the prevalence of satellite-based internet systems in Africa might be recorded as the country where the connection to the internet exists, rather than the one where the user is accessing it by satellite. He recalled the first Internet boom, when web analytics displayed all of the traffic channelled through AOL showed coming from US-based addresses.

As someone who witnessed firsthand the dot-com rise and dot-bomb detonation, I can attest: first-wave web analytics did persist in showing Herndon, Virginia, as the web’s early global centre. IP addresses do still appear to play a significant role in Google Analytics (as described in this 2016 article). Others note that accurately geolocating satellite-internet provider IP addresses is extremely challenging ‘because the satellites cover such extensive geographic areas’. Current, reliable statistics on satellite Internet users would be interesting to see—in their absence perhaps our colleagues in Africa can share what they know of such trends.

But let’s come back to Faustin’s questions:

Why does the USA stay at the top of the list of users of the website?

The U.S. remains at the top of the charts for both users and numbers of peer-reviewed papers. But after four years at the Secretariat compiling these and other statistics, what I find interesting is that American dominance of these trends is not quite what it once was. Which is to say, that even as totals have continued to rise, the proportion of U.S.-based users and research authors has actually declined compared to overall traffic, from around 15% to a little more than 12%. This alone might not seem like much, but the gap between positions #1 and #10 are decreasing, suggesting that the distribution of use of GBIF’s infrastructure is continuing to flatten out, even if the upper echelon appears static.

Signs of this are present elsewhere, too. For instance, user downloads from the U.S. and Mexico are consistently neck-and-neck, even though there are still far fewer of the latter (though this year their ranks appear to be steadily closing the gap with #2 India).

It also took an unusually large tranche of papers by U.S-based authors last December to nudge North America past the Latin America and the Caribbean region in peer-reviewed uses of GBIF-mediated data.

Even the U.S.'s lead in occurrence records gives a somewhat false impression, given that annual updates to largest dataset in GBIF.org, eBird Observational Dataset, typically account for the majority of new records published by all U.S. institutions each year, including 2017.

  • Why don’t African countries feature in the top countries using data from GBIF?

I hope you’ll see based on the above that the situation is not quite so simple. And given the differences in infrastructure and resources, maybe the fact that there’s a gap between African countries’ use of the GBIF network infrastructure—at least as measured by web traffic, data publication, downloads and research publications—should be less surprising.

Still, Faustin was responding to the April 2018 update, which doesn’t show the fact that South African-based users made the 8th-most downloads in the world in 2017 (see ‘Data download requests’ slide above).

We can also see that the BID programme is having an extremely positive impact on each of these measures. Africa—and, indeed, the Caribbean and the Pacific—are growth areas for the GBIF network. For the first time in 2017, authors based at African institutions published the same number of peer-reviewed papers using GBIF-mediated data as those from the Oceania region (72). If they appear to be lagging in 2018, remind yourself that the sample size is still somewhat small (and the Aussies seem to be having a banner year!).

  • Have we conducted surveys to know who’s accessing the website and what they are doing with the data they are downloading?

Despite the interest several of us have in conducting user surveys, the answer is, no, we have not. To be honest, our perception remains that some significant portion of the network could see this kind of survey as intrusive. Many will recall that securing agreement to require email registration as a precondition for downloading open data was contentious as recently as 2013. Ironically, it’s those user registrations that serve as our best source of location information on downloads.

We at the Secretariat would be eager to gauge whether the community is indeed still reluctant to have us canvass users about these and many other kinds of questions. The intelligence they provide could help target and prioritize improvements of all kinds.


@kcopas I wonder whether the explanation is that, as usual, resources explain many patterns of internet use. For example, it might be useful to plot data downloads against visits to gbif.org, I expect the latter is a good predictor of the former. If you compare country ranks across other biodiversity sites I suspect you’ll get similar patterns (the top countries here are pretty much those using my biostor.org, for example). If the GBIF patterns are replicated across, say, EOL and BHL, then there’s not really a GBIF-specific issue here, it simply reflects a broader pattern. Perhaps what you are seeing is what you’d predict from (a) the distribution of the total number of people online, and (b) fraction of those interested in biodiversity.


That said, according to Google Trends, Africa is a hotbed of queries using the term “biodiversity”, see https://trends.google.com/trends/explore?date=2017-01-01%202017-12-31&q=biodiversity


Yeah, this would be my hypothesis going into an analysis like visits to downloads by country, as you suggest. Maybe I’ll find the time (or an intern) to pick it up sometime :grin:


Thanks for sharing this, too.

Of course, the raw numbers for this are depressingly low—even without comparing it to searches for, say, ‘Kardashian’.


I could take a look at this if you send me the data


I posted some stats I did for africa here:

The story seems to be that while most Africa is not a top downloader, South Africa is a significant downloader and the continent as a whole ranks fairly highly among regions with similar populations sizes, like India.