This post is for biodiversity data workers who use the Darwin Core table checker or other command-line data checkers.
Darwin Core archives are ZIP files, usually either dwca-[subject]-[version].zip from an IPT or [lots of numbers].zip from GBIF. Let’s assume you have just one such archive to be checked. Navigate in the shell to the directory with that one archive.
The following shell function relies on the unzip utility. It inflates just the .txt data files into the current directory (not the .xml metadata files), renames them with their first two letters, then lists the contents of the current directory. Renaming makes the files much easier to work with (less typing!).
dz() { for i in *.txt; do unzip -q *zip "$i"; done && for j in *txt; do mv "$j" "${j:0:2}"; done && ls; }
In the screenshot below, “dz” is at work on the archive “dwca-betel_nut_diurnal_avian-v1.5.zip”:
I haven’t yet seen any 2-letter-name collisions when renaming the usual Darwin Core files, but the 2-letter condition could be edited in the function if a duplication arises in future.
Robert Mesibov (“datafixer”); mesibov@datafix.com.au