This is my first post to the GBIF Community, so please be kind if I didn’t follow some rules or whatever.
I have a large collection of datasets maintained in an own IPT and publish to GBIF. Some of the datasets grow and others may change (add/delete attributes, change format etc). Based on the post (When to assign a new DOI to an existing dataset?), DOI is assigned when a dataset is registered and it will not change when the dataset is updated/changed.
So, I wonder what is the best practice to do version control. Is there a kind of version number in the metadata of a dataset similar to the one within IPT (emlVersion; IPT also can hold versionHistory)? How should researchers cite a particular version of a GBIF dataset? Does GBIF archive the old versions and make them accessible?
A typical situation I have in mind is that a dataset gets registered to GBIF on, say, May 1st, 2021. I add new records to the same dataset and publish it to IPT every month, which is automatically reflected to GBIF (but the DOI remains same). A researcher downloads the dataset at some point, say, June 1st, 2021 and uses it for his/her analysis and publish a journal article. The journal requires an open access to the dataset. So, the researcher wants to provide a URL to the June 1st version of the dataset. The DOI alone can’t satisfy this as the latest dataset has more records than the June 1st version.
As in the post I mentioned above, you may suggest registering the dataset as a new dataset to get a new DOI every time it grows. But that is very difficult to maintain and it would register redundant and duplicate records which would confuse the users and may cause invalid outputs like an overestimate of species abundance. So, I don’t think it’s a good idea.