Extending, enriching and integrating data

hardistyar · February 19, 2021, 12:22pm

@dshorthouse, @abentley. It’s proven by Crossref and DataCite in the scholarly sector, by EIDR in the film/tv industry, and in other sectors that when data connections based on PIDs are made and the graph grows, new values accrue. What was not possible before becomes possible. Examples of this include Crossref’s Similarity Check plagiarism service for publishers and its Cited-by service.

If you want to begin to see what this could look like for extended digital specimens, explore the EIDR registry. Search for the title of your favourite film. Or for something a little more interesting, have a look at the filmography of Hollywood director, John Ford and see how for any of his films the EIDR registry can reveal much that is databased about them, including where you can watch, such as Amazon, Netflix, etc.

Most of what EIDR enables is exclusive to the workings of the film/TV production and distribution industry so we can’t see it. But we do see the benefit as consumers. EIDR helps the sector supply chain to function more effectively in a world that is now entirely digital and no longer reliant on celluloid - although, of course there are still many thousands of celluloid masters locked away in film vaults. EIDR supports accurate rights tracking and reporting down to the level of clips and composites across multiple languages/geographies, universal search and discovery, and detailed consumption metrics. It helps to ensure audiences see the correct language version in their local cinema or on their mobile phone, and that the right people get paid the correct amounts for their work. It’s easy to see parallels based on a similar registry for digital specimens in the natural science collections sector. Whilst we don’t broadcast specimens, we do carry out integrated analyses and synthesis based on the data they contain.

But as @abentley says, it needs everyone to play the same game. The rules and mechanics must be as simple as possible so players can engage easily and cheaply, building up their responsibility over time as it becomes apparent how their ROI increases.

A transactional method of publication, which the openDS concept represents can achieve this. Open Digital Specimens are mutable objects to which operations (transactions) can be applied. Some of those operations attach things to the DS, like annotations while others can modify/improve the object content itself or make links between the DS and other DSs or to third-party data, such as sequence data or trait data. As the number of openDS increases, services that assist linking and exploit it increase in value to the community as a whole.

Whether a blockchain approach is appropriate or helpful depends on multiple design factors. The most important ones are governance model and storage/implementation model. What is needed to govern and implement a ‘cloud of digital specimens’? We should think about and discuss these first, probably in the Making FAIR data for specimens accessible topic.

Topic		Replies	Views
Summaries - 2. Extending, enriching and integrating data Digital/Extended Specimen	4	1411	February 25, 2021
Background and context for phase 2 Digital/Extended Specimen	0	1087	June 8, 2021
Analyzing/mining specimen data for novel applications Digital/Extended Specimen	43	2896	April 4, 2021
Summaries - 1. Making FAIR data for specimens accessible Digital/Extended Specimen	2	1562	February 26, 2021
Summaries - 5. Analyzing/mining specimen data for novel applications Digital/Extended Specimen	0	1151	February 16, 2021

Extending, enriching and integrating data

Related topics