Over the past four years I’ve offered data training to biologists and biodiversity informatics enthusiasts. The methods were from A Data Cleaner’s Cookbook but were adapted for Darwin Core datasets.
The training materials now have their own website, Darwin Core table checker. After a “settling in” period I’ll deposit the whole site in Zenodo so it can be downloaded for offline use, and the Introduction page will have an all-versions Zenodo DOI link.
The new website is not Darwin Core: The Missing Manual, which remains missing. Instead I show how command-line tools can be used to tidy and correct a Darwin Core dataset.
The tools demonstrated on the new site are largely GNU/Linux programs running in a BASH shell. Absent are spreadsheet tricks, OpenRefine, R libraries and Python modules. The tools (especially GNU AWK) are fast, easy to use and reliable. They operate on Darwin Core datasets of any size. Most importantly, they’re flexible: they can be quickly adapted to answer almost any question you might have about data structure, formatting, content or quality.
Comments and questions about the new site are welcome, but please email me directly if your comment, question or suggested correction is a technical one. I’m happy to edit the training materials accordingly over the coming months (the “settling in” period!).
Please note that future trainings will be based on the Darwin Core table checker, rather than on specially prepared documents distributed by email as in the past.
UPDATE (17 January 2024): The Darwin Core table checker website has been updated and the current version is archived in Zenodo.
Robert Mesibov (“datafixer”); firstname.lastname@example.org