Collections catalogue (GRBio)

Hi Neil, yes, some communities have active resources (like IH), and plans are certainly to link to those through automated APIs – with the new resource that will add new functions like use of ORCIDs and direct linking of metadata about collections to the specimens each collection holds. In addition, metrics will be possible, that are not currently available on sites that are currently set up as lists.

I would argue that if the collections themselves saw some benefit arising from such a list - attribution metrics, publications, etc. - they would be only too happy to ensure that the data contained therein is accurate and up to date. I know I would!!

3 Likes

There is also the ASIH symbolic code list for ichthyology and herpetology collections - https://asih.org/standard-symbolic-codes that could be added as a resource to populate the new GRBio. Not as comprehensive in terms of fields as IH or others, but a start.

1 Like

Yes, to reiterate this provides a convenient mechanism for exposing not only undigitized collections but also the portions of digitized collections that are undigitized. This is also an important mechanism for the collections to provide metrics about our community that we have, until now, been unable to do. How many collections from how many institutions with how many specimens? It is also important to note that we need a one-to-many relationship between institution and collection with independent records for both and the ability to nest multiple collections within an institution - this was originally part of GRBio’s data model. I think the TDWG NCD standard being developed provides a great basis for the descriptive information needed for these records.

Apologies for just finding this thread and for having only looked quickly through it. I don’t mean to cast any shade on GRBio or GBIF, but you all do know, I hope, that there is another, fully functional, public database of biorepositories hosted at NCBI. Ensconced in the golden cocoon of NIH, it has never and will never go dark (at least until the day everything else does). It is continuously updated by GenBank Taxonomy staff. The NCBI BioCollections Database predates GRBio, although until 2018 it was an in-house-only resource. Seems to me the world needs only one database of this kind to set standards for and disambiguate collection codes. Since this one is extensive, conscientiously curated, paid for and permanent, why is it necessart to resurrect GRBio? https://www.researchgate.net/publication/323507244_The_NCBI_BioCollections_Database
https://ftp.ncbi.nih.gov/pub/taxonomy/biocollections/

John.

Yes, I am aware of the NCBI list but, as with all the other lists around, I suspect that it is not complete. It may be more complete than GRBio or the iDigBio list or the many discipline specific lists (Index Herbariorum, ASIH codes, Mammal list, etc. etc.) (I suspect biorespositories may be another subset of the whole), but we need to amalgamate them into a single community-wide resource that is the go-to source for this information. Where that lives, I don’t really care as long as it is accessible (both through web portal and through API) and permanently funded for the community.

We should be able to bring together all of these repositories, check them against each other, disambiguate any duplicates, synonyms, etc. to create a holistic list. We also need to use the NCD standards being developed by TDWG to articulate as much information about the collections at that level for all parties – aggregators, collections, users, etc.

Andy

I don’t know anything about the NCD standards you mention, but my suggestion is to take seriously the possibility that the NCBI Biocollections database could serve as the definitive resource for collection codes, if not for detailed information on collections. It has an efficient protocol for disambiguating every collection code, creating unique collection ID numbers and for creating synonyms when multiple codes/acronyms are associated with single collections. Although it currently has 8800+ collection codes registered, I am sure it is not complete as until now it has only been updated for collections from which sequences have been accessioned. However I’m pretty certain that there’d be willingness to add collections from which no accessions exist currently if GBIF or others provided the data. NCBI is not going to switch to another format for collection codes at some point in the future, so why create a competing system? If there’s a need for a database with detailed information on collections, then fine, build it. But why generate separate collection codes?

John

Here is more about the NCD (now called just CD) standard that is being worked on by TDWG - https://www.tdwg.org/community/cd/. The aim is to formalize and standardize the fields and vocabularies used to describe a collection to provide common terminology for the publishing and retrieval of collections information.

I am not saying that the NCBI resource is not a great one (although it is interesting that this list is then not used as a resource to provide structure to the currently unstructured voucher/tissue information associated with individual Genbank sequences for those submitting them?) – there are many good ones out there but, by your own admission, it is not complete and none of them are – some being taxonomic specific, some being geographic specific and yet others being project specific (like NCBI). This is the major problem. There is not one authoritative source for all of it. As such we should use all of these resources that are out there already to work on an authoritative list that is permanently stored and stable; that can be used by the community to codify the distributed network of collections as an infrastructure resource.

No one is advocating for changing or replacing the existing institutional and collection codes but simply building on the great work done by individual entities to produce a global authoritative list that can serve as the one resource for all collection information. Also, the collection codes and institutional codes are only one element of describing a collection for efficient use and for attribution, and advocacy. Take a look at the CD standard and the fields envisaged.

Andy

It is used to structure the specimen_voucher fields of new accessions as they come in. There is no automated process of structuring the huge volume of already accessioned unstructured specimen_voucher codes. I’ve been urging collection managers to think of this as a potential museum studies project for undergraduates: have them find and resubmit specimen voucher data from old records to GenBank in the structured format. But most collections are using collections software that doesn’t let them take advantage of voucher linking, anyway.

I wish the NCD initiative luck. I just wished to point out that an actively updated and staffed collection code database exists right now that won’t blink off like GRBio did. I would hope this community would take advantage of it. I’d be happy to help open a communication channel with the relevant people at NCBI (I’m not directly involved with it).

John, I would be interested in doing this for the 16,000+ Genbank sequences I have attached to tissues in my tissue database. How would I go about doing that? is there a batch way that this can be done? I am using the LinkOut system but I suspect that it is too hidden in the right hand menu to be used effectively by those using the sequences.

Andy

Thank you @Halooie1 for these very valuable comments.

I was not aware that the NCBI DB had become more open. GBIF was asked to resurrect GRBio primarily because there are many identifiers linking to it that had become dead. Since resurrecting we have a small group of editors, synchronize content from IH and will shorty merge in the iDigBio catalogue - all this aiming to reduce duplication in content curation. Could you please introduce me to the NCBI group so we can discuss a sensible way to proceed? trobertson@gbif.org

Hi Andy,

I’ll contact you by email on this.

~ John

Hi. Emailing you off this thread.

1 Like