Where can we publish images to be linked in GBIF/OBIS datasets

Hello, may I know where can we publish large number (>1k) of images that can be linked to Occurrence records which will be published to GBIF/OBIS? This question is being asked a lot in the marine/OBIS community. I personally also encounter dataset where the number of photos of specimens far exceeded Zenodo’s limit Zenodo FAQ - What are the size limitations of Zenodo?

Curious if anyone has the answer to this question please? Thanks a lot!

4 Likes

Dear @ymgan , we usually told our publishers that they can use Internet Archive, it is free and you can upload as many images/sounds/videos as you want, also there is no limit to the size of the media.
You can create a collection when the images are grouped Colección de Microorganismos de la Pontificia Universidad Javeriana - CMPUJ or in an individual profile INCIVA.

I know that you can also use Wikimedia and is free, but I do not have an experienci with ir or an example from our publishers, but maybe someone has experience with it and can share information.

1 Like

Thank you so so much Esteban!! This is so helpful! I believe this is what we are looking for, I will try it out!!

1 Like

@EstebanMH-SiB @ymgan This seems like a good solution, thanks for identifying it! It’s not clear if they mint persistent identifiers for the individual files, or the collections, of the things they archive. Do either of you know if that’s possible?

The persistent URI is for the file, thus dataset: Registro de varamientos de megafauna marina en Ecuador continental hast this archive.org item Varamientos Ecuador 2023 : Ministerio del Ambiente, Agua y Transición Ecológica : Free Download, Borrow, and Streaming : Internet Archive with 58 files and you can add more files latter to a item

1 Like

Got it. Thanks @vechocho !

The limitations for Zenodo in size and file number are only per record. You can make a community and add all images to it as separate records, like this example with ca. 270k images we did as a pilot for the ICEDIG project years ago: Search Belgium Herbarium of Meise Botanic Garden

You can find more info on the process in this report: Digitisation infrastructure design for Zenodo. Deliverable D6.3
The tech specs and the python scripts used for these pilots are out of date by now. I believe GitHub - plazi/lycophron: Batch uploader to Zenodo is a more up to date tool.

2 Likes

Cautionary note here that the IA is battling serious litigation. As a result, the Biodiversity Heritage Library whose page scans have been in IA for decades, is exploring alternatives like AWS.

3 Likes

Thanks for the useful insights @dshorthouse and @MatDillen. We will explore Zenodo a little bit so we can recommend it to our publishers and keep an eye on IA hoping they continue working without much trouble, they have been a great resource so far!

Thank you very much for this Mat! The Herbarium specimen records are nice. From what I understood, the examples are 1 image = 1 occurrence. I am wondering if you have any recommendations/examples on how to do this for images that can have multiple occurrences please?

Example: Submersible Gathered Evidence of a Vulnerable Marine Ecosystem at the Melchior Islands, Western Antarctic Peninsula (Subarea 48.1) - multimedia There can be sea stars, sponges and other organisms within the same picture.

Thanks a lot!

I’m not sure what else you need? That Zenodo record you linked to already does the job of hosting (multiple) multi-specimen images. You can enrich the record with a more standardized data file, like a Darwin Core archive that lists all identified observations from the images and video as occurrences. You can then use the Audiovisual Core extension to link the occurrences to the images they occur in. You could even link the occurrence to a region of interest within the image if you know where each of the organisms can be spotted: Audiovisual Core Term List - Audiovisual Core

What you probably can’t do is embed the data in the record as subjects, making them easily accessible through the Zenodo API, like we did in this example.

Thank you Mat!

I showed the Zenodo record as an example of the type of images that I have. It works for that dataset because there are <100 files, within Zenodo’s size limitations (per record).

I have another new dataset that I have not published yet with >700 of same type of images as the Zenodo record which way exceeded Zenodo’s size limitations (per record) and I am curious how to do it like the example you shared.

I think, you answered my question that we can’t.

Why can’t you put each image in a separate record (i.e. >700 similar records), or bundle them in smaller batches? That would require splitting up the data, or making a separate record for the overarching data and linking that one to each image record (like this one), but other than that I don’t see why it wouldn’t work?

At some point, you might exceed Zenodo’s Fair Usage policy. But that mainly depends on your total data volume (how many of such datasets you have and how much gigabytes their media files amount to). Individual cases like this should be fine.

2 Likes