Extending, enriching and integrating data

@dshorthouse, @abentley. It’s proven by Crossref and DataCite in the scholarly sector, by EIDR in the film/tv industry, and in other sectors that when data connections based on PIDs are made and the graph grows, new values accrue. What was not possible before becomes possible. Examples of this include Crossref’s Similarity Check plagiarism service for publishers and its Cited-by service.

If you want to begin to see what this could look like for extended digital specimens, explore the EIDR registry. Search for the title of your favourite film. Or for something a little more interesting, have a look at the filmography of Hollywood director, John Ford and see how for any of his films the EIDR registry can reveal much that is databased about them, including where you can watch, such as Amazon, Netflix, etc.

Most of what EIDR enables is exclusive to the workings of the film/TV production and distribution industry so we can’t see it. But we do see the benefit as consumers. EIDR helps the sector supply chain to function more effectively in a world that is now entirely digital and no longer reliant on celluloid - although, of course there are still many thousands of celluloid masters locked away in film vaults. EIDR supports accurate rights tracking and reporting down to the level of clips and composites across multiple languages/geographies, universal search and discovery, and detailed consumption metrics. It helps to ensure audiences see the correct language version in their local cinema or on their mobile phone, and that the right people get paid the correct amounts for their work. It’s easy to see parallels based on a similar registry for digital specimens in the natural science collections sector. Whilst we don’t broadcast specimens, we do carry out integrated analyses and synthesis based on the data they contain.

But as @abentley says, it needs everyone to play the same game. The rules and mechanics must be as simple as possible so players can engage easily and cheaply, building up their responsibility over time as it becomes apparent how their ROI increases.

A transactional method of publication, which the openDS concept represents can achieve this. Open Digital Specimens are mutable objects to which operations (transactions) can be applied. Some of those operations attach things to the DS, like annotations while others can modify/improve the object content itself or make links between the DS and other DSs or to third-party data, such as sequence data or trait data. As the number of openDS increases, services that assist linking and exploit it increase in value to the community as a whole.

Whether a blockchain approach is appropriate or helpful depends on multiple design factors. The most important ones are governance model and storage/implementation model. What is needed to govern and implement a ‘cloud of digital specimens’? We should think about and discuss these first, probably in the Making FAIR data for specimens accessible topic.