DwC-A / DwC-DP CLI validator?

Are there any command line validators for DarwinCore Archives and/or DarwinCore Data Packages? A GBIF-specific one would be cool, but I’m mostly interested in one that verifies the structure of the archive, e.g. do the metadata files conform to their declared schemas, do the files declared in the metadata exist, do those files have the declared columns, etc.

1 Like

@kueda I looked for one late in 2024 when trying to work with the perfectly awful ALA archives:

Getting ALA data to the usable stage

but didn’t find one. It would be pretty easy to write a checking function with AWK and other shell tools.

I wonder if the new DwC-A data package edition could be validated with the frictionless validator, I don’t see why not.

1 Like

@Kate and @trobertson, either of you know of such a tool? I posted this in anticipation of getting https://cuanto.bio to the point of needing data export, and now I’m there, so an “official” way to validate the DwC-DPs that I export would be helpful. Maybe that’s still pending locking down the standard?

@pieter using the frictionless validate command looks useful, but it’s not immediately clear to me that it will check the DwC-specific rules. I’ll check it out. Thanks!

FWIW, I had Claude make one: https://github.com/kueda/dwc-dp-validate. I’m not actually sure it does a good job so if anyone finds problems, please let me know. As far as I can tell frictionless validate checks for basic CSV validity and that datapackage.json’s requirements are met by the content, but it doesn’t ensure that datapackage.json actually conforms to https://raw.githubusercontent.com/gbif/dwc-dp/0.1/dwc-dp/dwc-dp-profile.json. It does get you most of the way there, though.

Hi @kueda good timing.

We are work in progress with Darwin Core Data Packages, but do have made a small CLI analyser which both explores the validation aspects you describe and some statistics.

I have made a release on github, so you fetch it: Release dwc-dp-analyser-0.0.6 - Runner CLI · gbif/dwc-dp-analyser · GitHub

The README.md in the repository should contain description on how to run it. (requires java)

2 Likes

Perfect, exactly what I was looking for. Thank you!