This thread will capture the questions that arose during and after the support hour for Nodes .
Question 1: Can you set up a backup for both the current IPT folder and its previous version?
There are many ways you can set up an automated backing system.
- Many online backup systems (like Backblaze, Box, DropBox, etc.) can synchronise your data automatically and some may store various versions of your data.
- With
rclone
(https://rclone.org), you could have several scripts running. For example, before your weekly backup, you can have another script which will store a copy of the current IPT folder on your backup system. This is pretty flexible, you can do what works for you. - If you are operating your IPT in “Archival” mode (see the IPT settings in the Administration menu), there might be less of a need to have multiple backed up versions of the IPT directory. The reason for that is that in this mode, the IPT will keep all the previous dataset versions published.
Question 2: How can we customize our IPT?
The latest IPT has a UI management page that will allow you to choose colors for your interface as well as upload a logo: Administration Menu :: GBIF IPT User Manual. If this UI doens’t allow the changes you would like to make, please log an issue on the IPT GitHub repository: Issues · gbif/ipt · GitHub
Question 3: Is there a way to batch edit dataset metadata for multiple datasets without having to go through the web interface?
If you have access to your IPT directory, on your folder, you can update or copy files directly in the resource folder. This can be done for EML files (which contain metadata) or source files. After you restart the IPT, the new data will be displayed. Note that this isn’t part of the best practise and you are doing this type of manipulations at your own risks.
If you really want to handle publishing programmatically, you should consider the GBIF Registry API instead of the IPT as it will give you more flexibility.
Question 4: Is there a way to have a contact stored that can be reused for several resources? The idea is that if an email address changes for a person, we don’t have to update all the datasets with that person set up as contact.
This isn’t possible in the current IPT, but keep an eye on the following related GitHub issues:
- Orcid integration and contact "store" · Issue #1439 · gbif/ipt · GitHub
- [EML edition] Use orcid API to automatically fill contact information · Issue #1563 · gbif/ipt · GitHub
- Repeat roles, not people in metadata editor interface · Issue #1166 · gbif/ipt · GitHub
Question 5: How to modify the IPT “about” page?
In the IPT latest versions, you can update the “about” page by going to the Administration Menu
and updating the GBIF registration options
. The information will be updated on the IPT “about” page as well as the GBIF installation page.
Question 6: Can you advise on the following case:
- A publisher uploaded a single large file to a test IPT and mapped it to a core with 6 extensions.
- They would like to now upload the data in a production IPT and ideally would like to reuse the work they did in the test IPT.
- The problem is that downloading the archive from the test IPT an re-uploading it to the production IPT means that the production IPT would have 7 files. This isn’t practical to update.
- Is there a way to set it up so that only one file can be updated without having to redo all the mapping in the production IPT?
This is one of those cases where it might be a good idea to go to the test IPT directory and take the resource folder of the dataset to put into the production IPT directory. Then when you restart the IPT, you will have the same exact configuration as the test IPT (with one file mapped to the core and extensions). This might not work for database connection as sources but it would work for a big file upload.
Question 7: How does publishing datasets with a Data Base at the source work? How is the data updated?
Once you have set up a database as a source for your dataset, every time you will click on the publish button (or every time the dataset is auto-published), the IPT creates a JDBC (Java Database Connectivity) connection and stream data through this connection, then closes it. This is what will generate a new dataset archive ready to be ingested on GBIF. If you have many datasets connected with big databases, you might want to avoid publishing them all at the same time.
Question 8: Would you recommend using Docker for the IPT? How should we set it up?
You should choose whatever is the most convenient for you. If you are used to Docker, it seems like using an IPT Docker container makes a lot of sense. You are very welcome to create your own image or you can check what is already available: Installation :: GBIF IPT User Manual