Another expert review of iNaturalist IDs in GBIF

datafixer · April 2, 2023, 6:54am

GBIF mediates almost 60 million “Research Grade” occurrences from the iNaturalist platform. Some of those occurrences may have incorrect identifications, and the same can be said for the ca 200 million identifications in GBIF for preserved specimens in museums and herbaria.

However, there’s an important difference between citizen-science IDs on iNaturalist and collections IDs. The evidence for iNaturalist IDs are images that can be examined 24/7 by anyone with an Internet connection. The evidence for collections IDs are specimens, which may only be available for examination through collection visits or loans. For this reason is it is much harder for an expert to review and correct collections IDs than it is to review and correct iNaturalist IDs.

On the other hand, iNaturalist images may not show the taxonomically diagnostic characters that make an ID possible. This is usually the case for millipedes, my specialty group in Australia. Unless the genitalia of a mature male can be closely examined, it may not be possible to ID the millipede to species, to genus, to family or in some cases to order.

To do a check on IDs, on 24 March 2023 I downloaded from GBIF (DOI) the 2029 “Research Grade” iNaturalist records for millipedes from Australia.

Please note that I’m not an observer or commenter on the iNaturalist platform. I assisted with IDs on BowerBird, an Australian predecessor of iNaturalist, and some of BowerBird’s records have been incorporated into iNaturalist.

Duplicates. iNaturalist observers are sometimes responsible for duplicate or near-duplicate occurrence records in GBIF. This happens when different images of the same natural object are uploaded as separate observations. The Australian millipede dataset contains 13 pairs of records with the same observer, location, date and taxon ID. Four of these pairs show the same image or images and verbatimEventDate but have different iNaturalist observation numbers and different GBIF ID codes:

gbifID	catalogNumber	recordedBy	verbatimEventDate	decimalLatitude	decimalLongitude
3760331880	111388026	Neil Ralph Tucker	2022/03/06 9:16 AM AEDT	-38.3906555	144.2503509444
3712730727	108210050	Neil Ralph Tucker	2022/03/06 9:16 AM AEDT	-38.3906555	144.2503509444
2563502995	37708748	Adam Edmonds	2019/06/04 4:51 PM AEST	-37.7692311795	144.9847573258
2265824279	26600536	Adam Edmonds	2019/06/04 4:51 PM AEST	-37.7692548282	144.9849033976
2237524799	23038069	QuestaGame	2019-04-14T06:54	-35.0162353	138.6115912
3398712411	23269691	QuestaGame	2019-04-14T06:54	-35.015423	138.60667
2237412699	11486465	teeks	2018-04-16T15:18	-34.8033490885	138.7889716287
2237412698	11486457	teeks	2018-04-16T15:18	-34.8101294965	138.757842927

In another 2 pairs of records with image sets, 1 of 2 images and 2 of 3 images were identical in the pair.

In each of these duplicate pairs the images (or image sets) were uploaded at different times. I was surprised to find these duplicates, because I assumed iNaturalist would screen for duplicate images. Because I was interested in how IDs were done by the iNaturalist community (see below) image by image, I retained all duplicates in my ID review.

ID review. I examined each of the 2029 images or image sets and annotated the record. Only one of the images showed the genitalia of a mature male, and unfortunately the image wasn’t clear enough for a species-level ID. My review is therefore based on whole-animal characters.

I classed the “Research Grade” (RG) IDs as follows:

correct - The image clearly shows the diagnostic characters of the taxon
likely - The ID is probably correct, but I can’t be sure because diagnostic characters aren’t clearly visible in the image
possible - The ID might be correct, but the image isn’t good enough to distinguish the identified taxon from another, similar taxon
unlikely - The ID is probably incorrect because the image appears to show a different taxon, although diagnostic characters aren’t clearly visible
incorrect - The image clearly shows the diagnostic characters of a different taxon

This classification is more nuanced than “correct/incorrect” or “correct/doubtful”. I might trust correct and likely, but I would disregard possible, unlikely and incorrect, and I’ve annotated those negative judgments with explanations in my review. Here are the results:

scientificName	correct	likely	possible	unlikely	incorrect	total
Native
Antichiropus variabilis	0	0	4	0	0	4
Australeuma jeekeli	7	0	0	0	0	7
Australocricus perditus	0	0	3	1	0	4
Australocricus sennae	0	0	8	0	0	8
Brochopeltis mjoebergi	8	0	0	0	0	8
Cladethosoma toowoomba	1	0	0	0	0	1
Cyliosoma excavatum	0	0	0	1	0	1
Cynotelopus notabilis	0	5	0	0	0	5
Heterocladosoma	0	0	3	0	0	3
Heterocladosoma bifalcatum	0	219	20	1	0	240
Hoplatessara pugiona	0	18	0	0	0	18
Isocladosoma maculatum	0	2	0	0	1	3
Lissodesmus martini	2	0	0	0	0	2
Notodesmus scotius	1	0	0	0	0	1
Peterjohnsia titan	1	0	0	0	0	1
Phryssonotus australis	1	0	0	0	0	1
Phyllocladosoma broelemanni	0	1	0	0	0	1
Phyllocladosoma dorrigense	0	1	0	0	0	1
Pogonosternum nigrovirgatum	0	0	3	0	0	3
Propolyxenus australis	1	0	0	0	0	1
Propolyxenus forsteri	1	0	0	0	0	1
Solaenodolichopus	0	1	0	0	0	1
Somethus	1	0	0	0	0	1
Somethus castaneus	0	8	0	0	0	8
Somethus tasmani	1	0	0	0	0	1
Tholerosoma monteithi	1	0	0	0	0	1
all native	26	255	41	3	1	326
Introduced
Asiomorpha coarctata	0	6	0	0	0	6
Blaniulus guttulatus	1	0	0	0	0	1
Ommatoiulus moreleti	1258	332	93	2	4	1689
Oxidus gracilis	0	0	0	2	1	3
Polyxenus lagurus	1	0	0	1	1	3
Trigoniulus corallinus	0	0	0	1	0	1
all introduced	1260	338	93	6	6	1703
all taxa	1286	593	134	9	7	2029

As might be expected for a citizen-science project that encourages observers to record natural objects in their backyards and in local parks and nature reserves, 84% of RG millipede IDs were of introduced species. Of those IDs, 99% were of the Black Portugese Millipede, Ommatoiulus moreleti.

Dominating the native millipede IDs was Heterocladosoma bifalcatum, a species native to southeast Queensland. H. bifalcatum has become established and superabundant in the Sydney metropolitan area, ca 700 km to the south, and this area was the source of most of the observations identified as this species.

Reliability of the IDs was fairly high: I classed 86% of the native millipede observations and 94% of the introduced millipede observations as either correct or likely.

Following. I also recorded as annotations the iNaturalist username of the first person to assign the image to its RG ID, the second person to do so, the third person to do so etc. (Only the first identifier is given in the identifiedBy field in the Darwin Core version of an iNaturalist record, so my information came from iNaturalist.) “Research Grade” status on iNaturalist is assigned by a fairly complex algorithm, but in practice if two IDs at the lowest taxonomic level agree and none disagree, the ID is accepted as the RG one. The tallies for the Australian millipede dataset were:

No. of RG IDs	No. of records
2	1629
3	372
4	25
5	3

I looked to see where in the sequence of identifiers the original observer appeared in the 1708 of 2029 records where the original observer suggested an ID. The original observer was first to suggest the RG ID in 1432 of 1587 correct or likely records (90%), and in 97 of 121 possible, unlikely or incorrect records (80%).

The original observer “followed” the suggestion of another iNaturalist member in 155 of 1587 correct or likely records (10%), and in 24 of 121 possible, unlikely or incorrect records (20%).

It isn’t good practice if an observer “follows” someone else’s suggestion simply because the observer thinks the other iNaturalist member knows best, but this undoubtedly happens. When only two IDs are suggested, the result is that an observation achieves “Research Grade” on the strength of a single ID.

The “only two IDs” component of the 1708-record set with an original observer ID contains 1346 records. Of these, the original observer “followed” another suggestion in 102 correct or likely records (8%) and 19 possible, unlikely or incorrect records (1%). “Following” may not be good practice, but its incidence in the Australian millipede dataset is low.

Conclusion. This review should not be seen as a fair evaluation of the correctness or otherwise of iNaturalist IDs. It examined only 2029 of ca 60 million records, representing one taxon and one geographical area.

From my point of view as a millipede specialist, however, the review confirmed that iNaturalist records in GBIF are of limited usefulness. Of the ca 2000 native Australian millipede species, only a small number can be positively identified from a whole-animal image, and distribution data for these and all other natives are better obtained from specimens in collections. Records for introduced species are valuable if IDs are reliable, but (again) only a small number of introduced species are identifiable from whole-animal images.

iNaturalist also holds observations that have not reached “Research Grade” and are not mediated by GBIF. Mining these “Casual” and “Needs ID” observations for new millipede records would hardly be worth the effort, to judge from this RG review.

Following email exchanges with Thomas Mesaglio and James K. Douch, some of the IDs in the dataset downloaded on 2023-03-24 have now been changed on the iNaturalist platform. My annotated version of the GBIF records from the 2023-03-24 download has been archived in Zenodo.

Robert Mesibov (“datafixer”); robert.mesibov@gmail.com

system · May 2, 2023, 4:55pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
GBIF adding false identifications to iNaturalist occurences Miscellaneous	2	2271	May 12, 2023
iNaturalist database not up-to-date? Miscellaneous	4	615	February 24, 2022
The strange case(s) of the missing identity Miscellaneous	23	242	September 8, 2024
Millipedes in the ocean Miscellaneous	11	1271	September 23, 2023
Identifying authors of iNaturalist observations within GBIF download data Miscellaneous	10	474	February 24, 2024

Another expert review of iNaturalist IDs in GBIF

Related topics