Summary: 6. Robust access points and data infrastructure alignment

This is the compilation of daily summaries (most recent first) written by the topic facilitators. The goal of this page is to orient new readers of the topic. Please go to the thread to comment.

Go to Robust access points and data infrastructure alignment

Summary 1 (June 15th-Jul 2)
Our first week on robust access points and data infrastructure alignment started off with discussions on the role of storage infrastructures. Two types of infrastructures were mentioned: 1) systems that contain dynamic data (like collection management systems, GBIF, WikiData) 2) repositories (such as Zenodo) that contain data snapshots and long-term archives. Both of these are integral as the first one supports day-to-day workflows and other long-time data access (and relevant services like referential integrity, content verification etc.). We also need to be aware of use cases based on specific storage needs such as storing sequence or media files.

Two other cross cutting themes emerged: 1) should such infrastructures be centralised or distributed and 2) who should be taking the lead in building and maintaining these? We have several examples: national projects such as NFDI in Germany, regional initiatives (DiSSCo, iDigBIo, ALA), global organisations like GBIF, large research institutions such as CERN, large scale international non-profit projects (Wikidata). Or even there could be collaboration with companies such as Amazon (Amazon Open Datasets) and Microsoft (Microsoft Planetary Computing) to create platforms for open data and analysis.

The answer to these questions needs to be evaluated on a case by case basis – specific collection management needs (JACQ and Arctos were mentioned), the size of data, scalability (can Wikidata, RDF stores handle 1.8B records?), and value added services (annotation, citation tracking, machine learning). The work around standards and protocols is essential as they not only provide data harmonisation paths but also robust access points immune to underlying changes in technology over long periods. We should also look beyond our immediate community and learn from and collaborate with initiatives such as UniProt and IIIF.

The new DES ecosystem might look like a hybrid solution with alignment around FAIR, some elements of federated data governance, standards (openDS, MIDS, DwC, ABCD+EFG, GGBN and more), open protocols, networks of distributed data facilities and services that can maintain domain (e.g., specimens, observations, publications, sequences) and service ownership but can interoperate and link across domain and contextual boundaries. We will also need a process of central discoverability, access control and alignment with governance and harmonisation of distributed domains and datasets.