Sibling datasets to overcome DwCArchive star schema limitation(2)

andre · July 25, 2022, 8:43am

Dear GBIF fellows,

Has announced before, I just published DNA derived data as two datasets:

The first one contains the Antarctic samples and lake’s physico-chemical measurements.
The second one contains the DNA-derived occurrences of Cyanobacteria.

I found this elegant solution, hoping it will not make it harder for Data Users to retrieve data from both datasets.
Other occurrences, from the very same samples, but from different authors working on different taxonomic groups will follow.
Maybe this practice should be encouraged until the new data model is in place.

Best regards

andre · July 26, 2022, 8:27am

This entity-relationship diagram better explains the relation between published data.

Entities are published as follows :

Sampling event core(first dataset)
Measurement or Fact(first dataset)
Occurrence core(second dataset)
DNA extension(second dataset)

The Event-Occurrence relation is implicit, with eventID referring to the events described in the first dataset.

cjk · July 29, 2022, 9:22am

Thanks very much, @andre , for sharing this example. This is very useful for BIFA project teams to follow and share more complete information given the current DwC-A structure.

Topic		Replies	Views
Sibling datasets to overcome star schema limitation Data Publishing	7	819	June 20, 2022
Traceability and version control when publishing a curated regional occurrence dataset with mixed original and previously published records Data Publishing data-quality	13	85	May 13, 2026
Diversifying the GBIF data model - intro Diversifying the GBIF data model	14	1297	July 21, 2022
Best practices using DNA derived data extension with event core Miscellaneous	2	471	December 12, 2022
Darwin Core Data Package - A new publishing format for biodiversity data (technical support hour for GBIF nodes) Data Publishing NodesSupportHour	1	192	September 15, 2025

Sibling datasets to overcome DwCArchive star schema limitation(2)

Related topics