Another expert review of iNaturalist IDs in GBIF

GBIF mediates almost 60 million “Research Grade” occurrences from the iNaturalist platform. Some of those occurrences may have incorrect identifications, and the same can be said for the ca 200 million identifications in GBIF for preserved specimens in museums and herbaria.

However, there’s an important difference between citizen-science IDs on iNaturalist and collections IDs. The evidence for iNaturalist IDs are images that can be examined 24/7 by anyone with an Internet connection. The evidence for collections IDs are specimens, which may only be available for examination through collection visits or loans. For this reason is it is much harder for an expert to review and correct collections IDs than it is to review and correct iNaturalist IDs.

On the other hand, iNaturalist images may not show the taxonomically diagnostic characters that make an ID possible. This is usually the case for millipedes, my specialty group in Australia. Unless the genitalia of a mature male can be closely examined, it may not be possible to ID the millipede to species, to genus, to family or in some cases to order.

To do a check on IDs, on 24 March 2023 I downloaded from GBIF (DOI) the 2029 “Research Grade” iNaturalist records for millipedes from Australia.

Please note that I’m not an observer or commenter on the iNaturalist platform. I assisted with IDs on BowerBird, an Australian predecessor of iNaturalist, and some of BowerBird’s records have been incorporated into iNaturalist.


Duplicates. iNaturalist observers are sometimes responsible for duplicate or near-duplicate occurrence records in GBIF. This happens when different images of the same natural object are uploaded as separate observations. The Australian millipede dataset contains 13 pairs of records with the same observer, location, date and taxon ID. Four of these pairs show the same image or images and verbatimEventDate but have different iNaturalist observation numbers and different GBIF ID codes:

gbifID catalogNumber recordedBy verbatimEventDate decimalLatitude decimalLongitude
3760331880 111388026 Neil Ralph Tucker 2022/03/06 9:16 AM AEDT -38.3906555 144.2503509444
3712730727 108210050 Neil Ralph Tucker 2022/03/06 9:16 AM AEDT -38.3906555 144.2503509444
2563502995 37708748 Adam Edmonds 2019/06/04 4:51 PM AEST -37.7692311795 144.9847573258
2265824279 26600536 Adam Edmonds 2019/06/04 4:51 PM AEST -37.7692548282 144.9849033976
2237524799 23038069 QuestaGame 2019-04-14T06:54 -35.0162353 138.6115912
3398712411 23269691 QuestaGame 2019-04-14T06:54 -35.015423 138.60667
2237412699 11486465 teeks 2018-04-16T15:18 -34.8033490885 138.7889716287
2237412698 11486457 teeks 2018-04-16T15:18 -34.8101294965 138.757842927

In another 2 pairs of records with image sets, 1 of 2 images and 2 of 3 images were identical in the pair.

In each of these duplicate pairs the images (or image sets) were uploaded at different times. I was surprised to find these duplicates, because I assumed iNaturalist would screen for duplicate images. Because I was interested in how IDs were done by the iNaturalist community (see below) image by image, I retained all duplicates in my ID review.


ID review. I examined each of the 2029 images or image sets and annotated the record. Only one of the images showed the genitalia of a mature male, and unfortunately the image wasn’t clear enough for a species-level ID. My review is therefore based on whole-animal characters.

I classed the “Research Grade” (RG) IDs as follows:

  • correct - The image clearly shows the diagnostic characters of the taxon

  • likely - The ID is probably correct, but I can’t be sure because diagnostic characters aren’t clearly visible in the image

  • possible - The ID might be correct, but the image isn’t good enough to distinguish the identified taxon from another, similar taxon

  • unlikely - The ID is probably incorrect because the image appears to show a different taxon, although diagnostic characters aren’t clearly visible

  • incorrect - The image clearly shows the diagnostic characters of a different taxon

This classification is more nuanced than “correct/incorrect” or “correct/doubtful”. I might trust correct and likely, but I would disregard possible, unlikely and incorrect, and I’ve annotated those negative judgments with explanations in my review. Here are the results:

scientificName correct likely possible unlikely incorrect total
Native
Antichiropus variabilis 0 0 4 0 0 4
Australeuma jeekeli 7 0 0 0 0 7
Australocricus perditus 0 0 3 1 0 4
Australocricus sennae 0 0 8 0 0 8
Brochopeltis mjoebergi 8 0 0 0 0 8
Cladethosoma toowoomba 1 0 0 0 0 1
Cyliosoma excavatum 0 0 0 1 0 1
Cynotelopus notabilis 0 5 0 0 0 5
Heterocladosoma 0 0 3 0 0 3
Heterocladosoma bifalcatum 0 219 20 1 0 240
Hoplatessara pugiona 0 18 0 0 0 18
Isocladosoma maculatum 0 2 0 0 1 3
Lissodesmus martini 2 0 0 0 0 2
Notodesmus scotius 1 0 0 0 0 1
Peterjohnsia titan 1 0 0 0 0 1
Phryssonotus australis 1 0 0 0 0 1
Phyllocladosoma broelemanni 0 1 0 0 0 1
Phyllocladosoma dorrigense 0 1 0 0 0 1
Pogonosternum nigrovirgatum 0 0 3 0 0 3
Propolyxenus australis 1 0 0 0 0 1
Propolyxenus forsteri 1 0 0 0 0 1
Solaenodolichopus 0 1 0 0 0 1
Somethus 1 0 0 0 0 1
Somethus castaneus 0 8 0 0 0 8
Somethus tasmani 1 0 0 0 0 1
Tholerosoma monteithi 1 0 0 0 0 1
all native 26 255 41 3 1 326
Introduced
Asiomorpha coarctata 0 6 0 0 0 6
Blaniulus guttulatus 1 0 0 0 0 1
Ommatoiulus moreleti 1258 332 93 2 4 1689
Oxidus gracilis 0 0 0 2 1 3
Polyxenus lagurus 1 0 0 1 1 3
Trigoniulus corallinus 0 0 0 1 0 1
all introduced 1260 338 93 6 6 1703
all taxa 1286 593 134 9 7 2029

As might be expected for a citizen-science project that encourages observers to record natural objects in their backyards and in local parks and nature reserves, 84% of RG millipede IDs were of introduced species. Of those IDs, 99% were of the Black Portugese Millipede, Ommatoiulus moreleti.

Dominating the native millipede IDs was Heterocladosoma bifalcatum, a species native to southeast Queensland. H. bifalcatum has become established and superabundant in the Sydney metropolitan area, ca 700 km to the south, and this area was the source of most of the observations identified as this species.

Reliability of the IDs was fairly high: I classed 86% of the native millipede observations and 94% of the introduced millipede observations as either correct or likely.


Following. I also recorded as annotations the iNaturalist username of the first person to assign the image to its RG ID, the second person to do so, the third person to do so etc. (Only the first identifier is given in the identifiedBy field in the Darwin Core version of an iNaturalist record, so my information came from iNaturalist.) “Research Grade” status on iNaturalist is assigned by a fairly complex algorithm, but in practice if two IDs at the lowest taxonomic level agree and none disagree, the ID is accepted as the RG one. The tallies for the Australian millipede dataset were:

No. of RG IDs No. of records
2 1629
3 372
4 25
5 3

I looked to see where in the sequence of identifiers the original observer appeared in the 1708 of 2029 records where the original observer suggested an ID. The original observer was first to suggest the RG ID in 1432 of 1587 correct or likely records (90%), and in 97 of 121 possible, unlikely or incorrect records (80%).

The original observer “followed” the suggestion of another iNaturalist member in 155 of 1587 correct or likely records (10%), and in 24 of 121 possible, unlikely or incorrect records (20%).

It isn’t good practice if an observer “follows” someone else’s suggestion simply because the observer thinks the other iNaturalist member knows best, but this undoubtedly happens. When only two IDs are suggested, the result is that an observation achieves “Research Grade” on the strength of a single ID.

The “only two IDs” component of the 1708-record set with an original observer ID contains 1346 records. Of these, the original observer “followed” another suggestion in 102 correct or likely records (8%) and 19 possible, unlikely or incorrect records (1%). “Following” may not be good practice, but its incidence in the Australian millipede dataset is low.


Conclusion. This review should not be seen as a fair evaluation of the correctness or otherwise of iNaturalist IDs. It examined only 2029 of ca 60 million records, representing one taxon and one geographical area.

From my point of view as a millipede specialist, however, the review confirmed that iNaturalist records in GBIF are of limited usefulness. Of the ca 2000 native Australian millipede species, only a small number can be positively identified from a whole-animal image, and distribution data for these and all other natives are better obtained from specimens in collections. Records for introduced species are valuable if IDs are reliable, but (again) only a small number of introduced species are identifiable from whole-animal images.

iNaturalist also holds observations that have not reached “Research Grade” and are not mediated by GBIF. Mining these “Casual” and “Needs ID” observations for new millipede records would hardly be worth the effort, to judge from this RG review.


Following email exchanges with Thomas Mesaglio and James K. Douch, some of the IDs in the dataset downloaded on 2023-03-24 have now been changed on the iNaturalist platform. My annotated version of the GBIF records from the 2023-03-24 download has been archived in Zenodo.

Robert Mesibov (“datafixer”); robert.mesibov@gmail.com

4 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.