Sibling datasets to overcome DwCArchive star schema limitation

Dear GBIF fellows,

Has announced before, I just published DNA derived data as two datasets:

The first one contains the Antarctic samples and lake’s physico-chemical measurements.
The second one contains the DNA-derived occurrences of Cyanobacteria.

I found this elegant solution, hoping it will not make it harder for Data Users to retrieve data from both datasets.
Other occurrences, from the very same samples, but from different authors working on different taxonomic groups will follow.
Maybe this practice should be encouraged until the new data model is in place.




This entity-relationship diagram better explains the relation between published data.

Entities are published as follows :

  • Sampling event core(first dataset)
  • Measurement or Fact(first dataset)
  • Occurrence core(second dataset)
  • DNA extension(second dataset)

The Event-Occurrence relation is implicit, with eventID referring to the events described in the first dataset.


Thanks very much, @andre , for sharing this example. This is very useful for BIFA project teams to follow and share more complete information given the current DwC-A structure.

