1.7. Foundation for new and enriched services (USE)

I think that it’s important to look forward to multiple downstream uses or synergies with the catalog, but for the purposes of sustainability this may actually be a distraction based on the experiences of a couple other data infrastructure initiatives. In particular, there could be major downsides to entangling the most basic functions of a collections catalogue with broader aims for an innovative new platform for biodiversity knowledge integration. The experience of the TAIR database may be informative here, based on an analysis by Sabina Leonelli (2013). “One early strategy adopted by curators was to create several different search engines within TAIR, each of which would provide a different perspective on Arabidopsis biology… Not all of these tools have been found to be equally valuable and accessible by plant researchers, and TAIR curators have reduced their ambitions over time, focusing increasingly on updating sequence and functional data on Arabidopsis rather than including new data types and tools for comparison across plant species (which might be viewed as one reason for their loss of funding)” (Leonelli 2013). That last comment on TAIR’s loss of NSF funding notwithstanding, the database represents one of the greatest success stories in biology for sustainable funding and research impact.

In contrast, Leonelli describes how the Cancer Biomedical Informatics Grid (caBIG) sought to address the widest range of possible user goals by “pushing the databases collected under its purview to adopt common formats and follow basic structural rules enabling basic interoperability across different databases,” with the consequence that the resource “is not supposed to operate on a shared, unified understanding of what it could be used for” (Leonelli 2013, 457). Unfortunately, interoperability and a wide range of potential use cases did not drive strong engagement and adoption for caBIG. The conclusion, I think, is not to abandon the bigger goals for data integration that will leverage the catalogue, but to focus narrowly on what queries the catalog can support best in the short to medium term and that correspond to a sufficiently important audience (e.g. large, high impact, well-resourced, etc).

Leonelli, Sabina. 2013. “Global Data for Local Science: Assessing the Scale of Data Infrastructures in Biological and Biomedical Research” 8 (4). Nature Publishing Group: 449–65. doi:10.1057/biosoc.2013.23.