Preferences or recommended best practices for granularity of data

I am a data manager on a large multidisciplinary marine research project.

We collect data from many stations, separated by tens of kilometres in most cases. For our physical oceanography data, we work a lot with NetCDF-CF files. I usually recommend that researchers create individual files per station, and publish them together in a single data collection. This increases flexibility, since a data user may only be interested in a subset of the data.

We also collect a lot of biodiversity data. I was wondering how this is often handled for Darwin Core Archives, and what is the recommended best practice?

Should the data be published together in a single Darwin Core Archive? Or should the data be divided up, 1 Darwin Core Archive per station?

Assuming you’re intending to publish said archives to GBIF: if you’re interested in knowing how much data from each individual station is being downloaded, used and cited by GBIF users, it might make sense to divide the data up into one dataset per station.

I’m sure there are many other considerations, but perhaps one of my colleagues can chime in.

I think this is also a question for the wider community to get an overview of what data providers do. Perhaps some of the GBIF Nodes would be interested in sharing their experiences?

For this specific case, we recommend publishing the dataset structured with the Event Core, following OBIS dataformat specifying all the stations, and also use the Occurrence Extension for the records. You can use additional tables for abiotic data with the ExtendedMeasurementOrFact Extension.

You can keep the granularity documenting information to stations using darwin Core terms like parentEventID and eventID for specific information of every event, like particular points or transects in the station, dates, etc.

e.g:
parentEventID:Station[x]
eventID:Station[x]_Point[x]_Date[x]

You can always filter data by parentEventID and eventID to get individual information for each station and also keep the data together.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.