Today @vijay.barve, @Lily, @melissa0520 and I started the Asia Office Hour as a casual space for BIFA teams to have LIVE Q&A sessions on (mainly) data publishing.
For our first session, I prepared a flowchart trying to answer the qualification part of the most frequently asked question: How to publish data with GBIF?
Yes, that question needs a four-day workshop to cover properly, but in the office hours we will fill gaps, speaking in other words and hope some live interactions will improve knowledge retention.
I shared this slide with participants today. Hope it’s self-explaining! Please comment if you find anything incorrect or unclear, so I can revise it.
Thanks again for those who joined today! Special thanks to Choki from Bhutan for sharing the biodiversity information portal for Bhutan, and it’s role in supporting domestic citizen science activities. We also learned how the data underneath have been published to GBIF.
remember to add the information in “acceptedNameUsageID” about what taxon and the specie taxon ID you refer to (i.e. Species2000, Catalogue Of Life)
how to do if there is no any date information of specimens: should provide the date/year range than a blank. i.e. 2007-03/09, 2007-05-20/25, 1900/1909 (some time during the interval between the beginning of the year 1900 and the end of the year 1909)
Welcome to join us every Thursday or keep following the posts here!
Thanks to Dr Lu, Dr Pujary and Riya for joining us today! We enjoyed having the opportunity to elaborate about the structure of Darwin Core Archive, GBIF data connection with citizen science activities, the value of having a GBIF data publishing badge and how to meeting the data publishing requirement for BIFA midterm report. It was also good to explain about why GBIF only allows institutions to be data publishers. Hope you find it useful, too!
Thanks to the participants today. It’s great to learn about the efforts from Botanical Survey of India that has herbaria resources catalogued. NGCPR is another CSR contribution from India that awaits further engagement with GBIF data publishing and domestic conservation experts. Hope we will hear more from them.
For those who are busy preparing the first dataset for the midterm, if you find no clues about certain fields not showing up, we have a hint for you. On your dataset page at GBIF.org, look for “DOWNLOAD” tab near the title, and choose “GBIF annotated archive”:
What you will download is what GBIF.org sees as the result of your data publication. Therefore, by examining the text files, you may be able to spot the missing part, for example, wrong mapping, when you discovered that the header in the CSV file doesn’t represent the value in the same column.
But don’t forget, depending on the scale of the potential issue, sometimes it’s still easier to see the comparison if you examine each occurrence record. There you have the “Interpreted” VS “Original” to understand how your dataset has been, eh, interpreted.
From the Asia office hour last week, we introduced the basic concepts of data mobilization and clarified the differences between “GBIF data publisher” and “IPT individual account”. Here are the summaries:
GBIF data publisher is only available for organizations, not for individuals.
You need to apply for your IPT account to manage and publish your dataset. IPT account is for becoming the user on IPT.
Whenever you publish a new dataset to GBIF, you need to choose a GBIF data publisher from the organization list in the IPT metadata section.
During a conversation with a GBIF data publisher recently, I noticed that he said “…we regularly upload our data to GBIF…”. That reminds me about a common misconception among publishers newly engaged with GBIF data publishing.
The fact is, in GBIF data publishing model, we don’t upload data to GBIF. Instead, we make data public and register it with GBIF, or, we publish data through GBIF.
GBIF implements a distributed model for data hosting. In the model, data publishers are required to maintain the data online thus assumed part of the responsibility to keep the infrastructure up. This also allows organizations to have full control over their data. For data publishers who have limited capacity to keep data online, they can have their data hosted by technical installations (e.g. IPTs) operated by other capable organizations.
When using other services to publish data, one may indeed required to upload a copy to the designated server or website. However, in the GBIF model, data publisher only upload their data to the technical installations, or IPTs. And the installation is fully managed by the host, not GBIF.
This slight difference of wording highlights the ownership and responsibility of data publishers, especially when often upload is interpreted as hand over or give, which may be the mindset contributed to the fact that we have many orphan datasets in GBIF today.
Of course, people can always use upload loosely, as long as we take care of our data published out there by keeping contact information up-to-date and responding to quality inquiries.
As it never appears in this thread, I am posting the time of the Asia Office Hours so whoever dropped on this thread knows where and when to join.
We run the session for an hour on every Thursday at:
16:00 Tokyo/Seoul (GMT+9)
15:00 Manila/Taipei/Singapore (GMT+8)
14:00 Hanoi/Phnom Penh/Jakarta (GMT+7)
12:45 Kathmandu (GMT+5:45)
12:30 Mumbai (GMT+5:30)
12:00 Islamabad (GMT+5:00)
How to convert Degree-Minute-Second coordinates to decimal that is required by Darwin Core? Many ask this question, and depending on the volume of your dataset, there are solutions using different tools. In most cases, Excel should handle the task okay, as long as the format of all values in a column is consistent, and don’t get lost in repeating the steps.
Essentially, one use the degree(°), minute('), second(") and space( ) characters as delimiters to separate values and notations to their own columns, then use the values to calculate the decimal value following this formula: