June 2020 community webinar

Post your own follow-ups for the June 2020 community webinar here.

The following is a transcript of the webinar Q&A, lightly edited for clarity.

Anna Sandionigi: All the BIN of BOLD are integrated in GBIF?
A: You can see the BIN dataset on GBIF here.

Pier Luigi Buttigieg: Is there guidance on using MIxS and/or other GSC standards along with DwC?
A: We still need to bridge GSC and TDWG standards
Pier Luigi Buttigieg: does this mean the MixS extension will move out of the Sandbox andbecome ratified by TDWG?
A: It would be interesting to know. There were teams working on this, but I’m not sure where we are with this. with MIxS getting IRIs and expressable as RDF, it may be easier to embed or link in both directions.

Anabela Plos: About the guide and the workshop…any plan of translation to Spanish and other languages?
A: As with any other GBIF produced documentaion, we welcome translations to increase usefulness for broader community. We are very grateful to GBIF community of volunteer translators. Our model has been to encourage and enable translation based on the needs and interest of our language communities. Our digital documentation supports fully supports this using the CrowdIn tool, which many in the community now know well. If any of the language communities take an interest in translating the sequencing guide, we will be happy to accommodate and support that effort once the text is complete and through the community review process.
Anabela Plos: Thanks for the replay. I think we need to start to work on this. An try to add more translators on the community for all the languages (and all the backgrounds). :slight_smile:
A: Yes please! See information on translation community here.

Camila Plata: Greetings from SiB Colombia. Which plans do you have for translation of the guide and course to other languages after its release?
A: Thanks for this questions, see answer given to Anabela under Answered.

Evgeniy Meyke: Is the idea that it would be possible to publish sequences to GBIF directly? Or the preferred way is via exisitng communities/platforms such as GenBank, BOLD, UNITE. Apologies if this was asked already.
A: There are two different tasks and audiences here: one to publish and archive sequences in the specific genetic portals, such as GenBank, BOLD, UNITE, another to resurface these data in the context of the general biodiversity discovery platforms. The publication of sequence behind the occurrence will be decided by the publisher and likely influenced by the country policies on that. I think it’s absolutely essential to use an INSDC resource. This international framework is where most other communities with scattered data are striving to be.

Pier Luigi Buttigieg: Something for Dmitry to react to : Conceptually, I don’t think there’s anything “dark” about the biodiversity we find through sequencing (accounting for errors etc), we just haven’t named it in the traditional way and it targets a particular dimension of biodiversity.
Dmitry Schigel: I’d argue it’s still dark if we cannot confidently assign it to any major lineage (e.g. phylum) and therefore have little to no natural history knowledge about what it is I agree, they shine very brightly to me. This is exatly why we started to include global OTUs into the taxonomic backbone.
@Chris: yes, that’s a special case that’s dark, but only to our taxonomy systems.
Dmitry Schigel: glad they shine! if we can progressively remove artifacts from clustering algorithms etc., then we’ll be much more robust. Pier deniosing etc will be covered, though lightly, in the guide, but carfeul documentation of steps will be encouraged.

Evgeniy Meyke: If the publisher decides to and country policy allows, GBIF accepts direct sequence publication?
A: I think this point will collect many opinions during the consultation, but if the publisher / country will decide to publish ASV alongside the observation, in many cases it will be only a short barcode fragment as a basis for the name / OTU. GBIF itself does not have current plans to archive sequences, as answered orally, GenBank would be still the go-to place for sequence archival. We aim to resurface and to actualize the sequence derived data in the context of general biodiversity discovery platforms alongside the musuem, citizen science, and other data.
GenBank only or any INSDC resource?

Gabi Droege: Hi Dmitry, nice talk! Do you index all sequence records that have coordinates from ENA? If so I have two questions: a) how do you connect sequences from the same specimen_voucher but different locus? b) how do you make sure the sequence based record is not duplicating the original specimen occurence? I might contact you soon for a chat with some other GGBN and GSC colleagues to continue this conversation :slight_smile:
Dmitry Schigel: Thanks Gabi, yes please let’s resume discussion like in the good old days! You would like you to read the draft for consultation. Multilocus barcoding and potential duplication is somthing that requires much better persistend identifier world than the one we live it, and suggestions will be very welcome during the feedback collecting period. GBIF itself does not have current plans to archive sequences, as answered orally, GenBank would be still the go-to place for sequence archival. We aim to resurface and to actualize the sequence derived data in the context of general biodiversity discovery platforms alongside the musuem, citizen science, and other data. If one Occurrence have basisOfRecord = PreservedSpecimen and the other Occurrence have basisOfRecord = MaterialSample it is clear that they are not the same specimen published twice, but different tokens or evidences… Our plan has been anyway to engage with GGBN very closely again, once the first draft is ready. Copy editing and formatting, and we are ready.

Andre Heughebaert: Are dark biodiversity projects eligible for BID2? Are they encouraged in some ways?"
A: Hi André, all types of biodiversity data that can be published through GBIF can be mobilized in BID-funded projects. The main focus is on demonstrating the potential value of the data to be mobilized to decision making on conservation and sustainable development. We are not promoting any particular data type at this stage.

Eric Crandall: Hi Dmitry - I enjoyed your talk. Is the GBIF effort focused on barcoding-type loci, or does it encompass any kind of sequence data?
Dmitry Schigel: Hi Eric, eDNA and metabarcoding is the main target for the occurrences, and commonly used barcodes as detection and identificaion means, but the guide will cover more, e.g. qPCR detection for presence/absence.

Jean-François MOUSSA: Is it possible to use our CESP account for submit the project to the grants portal?
A: The account you have in the GBIF Grants Portal is not restricted to one programme only. You can use your account to submit proposals for the BID programme as well. Thanks.

Joel Sachs: Dmitry and others: Given the prevalence of lateral gene transfer in prokaryotes, genes can sometimes be looked at as properties of a locality, as opposed to properties of a species. (Bob Robbins elaborates on this PoV.) How hard would it be to build a gene-centric (as opposed to taxon-centric) interface off of the way GBIF sequence data is structured?
A: That’s a powerful idea. Given the non-prokayote origins of GBIF, names play pretty central role in GBIF index. A quick answer is that I suspect quite hard (but GBIF governing board to decide), and I am not sure how hard from the informatics point of view, but new ingestion and indexing pipelines at GBIF allow for much greater flexibility in how backend and portal work. Cool idea, I am certainly with you that there is more than one answer about what is a "biodiversity grain.