Third-party data services explained with a shanty

datafixer · September 4, 2022, 8:41am

For some years now I’ve been arguing for third-party services to be interposed between collection datasets and aggregators like GBIF.

The services would be human, not coded, because people are the best data cleaners and because there are data problems in collection management systems (CMSes) that are impractical to fix with code.

Not many in the biodiversity informatics community have looked kindly at this suggestion, so to advance the idea (or maybe to kill it with a single post) I offer the following song to explain how a third-party data-cleaning service would work. The tune is The Wellerman.

There once was a coll with a CMS
Whose records were a dreadful mess
Sadly did the staff confess
“They’re really not ready to go”

Chorus
Soon may the Dataman come
To fix our records for a modest sum
Then, when the tidying’s done,
We’ll share them in Darwin Core!

Formatting dates is most accursed
Should day or month or year go first?
At times one way, at times reversed
That’s just the status quo

Chorus

When not sure where a place is at
We have a way to deal with that
A zero long and zero lat
Is how we make that show

Chorus

Free-text fields allow for staff who’d
Enter things to be reviewed
So many entries now are queued
And checking is quite slow

Chorus

When typing name or place or date
These things we oft abbreviate
They’re hard to disambiguate
Unless you’re in the know

Chorus

Look-up tables aren’t our norm
We enter names in many a form
In lists each name will have a swarm
Of variants below

Robert Mesibov (“datafixer”); robert.mesibov@gmail.com

system · October 4, 2022, 6:41pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Secret Sauce to Visible Sauce! Controlled Vocabularies data-quality , fair-data , controlled-vocab , data-visualization	4	95	September 8, 2024
Darwin Core Half-Million - UPDATE Data Publishing	11	1080	December 8, 2022
Data Use Club Practical Session : Data Quality Data Use	1	798	December 14, 2022
Investigating taxonomic issues on GBIF.org Data Publishing NodesSupportHour	6	183	February 13, 2025
You're Invited to TaxonWorks Together 2022 - What's New in TaxonWorks? What's Different? What's Next?	2	542	October 7, 2022

Third-party data services explained with a shanty

Related topics