GBIF backbone - long format to wide format

Hi there,
I am trying to convert the gbif backbone (Taxon.tsv) from its long format to a wide format that is more conducive to analysis. By wide format I meant that the rank is a column, and then each species becomes a row - e.g.:

"kingdom", "phylum", "class", "order", "family", "species"
animalia, arthropoda, insecta, odonata, coenagrionidae, ischnura elegans

So far, I have been using a quite involved script map “children” taxa to their “parents”, including some backward-recursive logic to find a parent in higher levels if a direct merge fails. This procedure is slow and seems clumsy, and I wanted to check if there is a pipeline available to generate what I want from the gbif backbone?

Maybe this is a question for @mgrosjean?

Hi @mluerig,

The Taxon.tsv file has the fields kingdom, phylum, class, order, family, genus which contains names. If you select the lines where the rank is “species”, you should get the higher taxonomy in those fields for each species.
One thing you would still have to do is find the accepted species for the species that don’t have the taxonomic status “accepted” by using the acceptedNameUsageID field.

Does it make sense?

1 Like

thanks, this is of course the solution - should have checked before throwing it into my pipeline

very cool - is this a recent implementation?

1 Like

Hey @mluerig -

Long form taxonomies are supported through Nomer (a tool that “maps identifiers and names to other identifiers and names”, GitHub - globalbioticinteractions/nomer: maps identifiers and names to other identifiers and names) out of the box.

To generate gbif long form - type the following in your terminal/command line -

nomer ls --include-header gbif 

to produce stuff like -

providedExternalId providedName relationName resolvedExternalId resolvedName resolvedAuthorship resolvedRank resolvedCommonNames resolvedPath resolvedPathIds resolvedPathNames resolvedPathAuthorships resolvedExternalUrl
GBIF:1423395 Ischnura elegans HAS_ACCEPTED_NAME GBIF:1423395 Ischnura elegans (Vander Linden, 1820) species Animalia | Arthropoda | Insecta | Odonata | Coenagrionidae | Ischnura | Ischnura elegans GBIF:1 | GBIF:54 | GBIF:216 | GBIF:789 | GBIF:8577 | GBIF:1423281 | GBIF:1423395 kingdom | phylum | class | order | family | genus | species | | | | | Charpentier, 1840 | (Vander Linden, 1820) Ischnura elegans (Vander Linden, 1820)

other supported taxonomies include, but are not limited to, ITIS, World of Flora Online, Catalogue of Life, DiscoverLife, Paleobiology Database, Mammal Diversity Database and NCBI Taxonomy.

And . . . Nomer uses a versioned snapshots of taxonomic resources to reduce variability due to upgrades or internet outages [1].

See also GitHub - globalbioticinteractions/name-alignment-template: align names with known taxonomic resources and associated re-usable workshop at Big-Bee-Network: 18 January 2023 . The name alignment template is configured to produce name alignments like:

providedExternalId          
providedName                Adoretus
parseRelation               SAME_AS
parsedExternalId            
parsedName                  Adoretus
parsedAuthority             
parsedRank                  
parsedCommonNames           
parsedPath                  
parsedPathIds               
parsedPathNames             
parsedPathAuthorships       
parsedNameSource            gbif-parse
parsedNameSourceUrl         https://linker.bio,https://zenodo.org/records/10810821/files,https://zenodo.org/records/10045382/files,https://zenodo.org/records/10037817/files,https://zenodo.org/records/8327611/files
parsedNameSourceAccessedAt  hash://sha256/d2903d0384a8b8193819b8061c8c4e6fec8cc2f7fe72dc0e91c90c07ba2fe15e
alignRelation               HAS_ACCEPTED_NAME
alignedCatalogName          itis
alignedExternalId           http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=187484
alignedName                 Adoretus
alignedAuthorship           Dejean, 1833
alignedRank                 genus
alignedCommonNames          
alignedKingdomName          Animalia
alignedKingdomId            ITIS:202423
alignedKingdomAuthorship    
alignedPhylumName           Arthropoda
alignedPhylumId             ITIS:82696
alignedPhylumAuthorship     
alignedClassName            Insecta
alignedClassId              ITIS:99208
alignedClassAuthorship      
alignedOrderName            Coleoptera
alignedOrderId              ITIS:109216
alignedOrderAuthorship      Linnaeus, 1758
alignedFamilyName           Scarabaeidae
alignedFamilyId             ITIS:114493
alignedFamilyAuthorship     Latreille, 1802
alignedSubfamilyName        Rutelinae
alignedSubfamilyId          ITIS:678509
alignedSubfamilyAuthorship  MacLeay, 1819
alignedTribeName            Adoretini
alignedTribeId              ITIS:926256
alignedTribeAuthorship      Burmeister, 1844
alignedSubtribeName         
alignedSubtribeId           
alignedSubtribeAuthorship   
alignedGenusName            Adoretus
alignedGenusId              ITIS:187484
alignedGenusAuthorship      Dejean, 1833
alignedSubgenusName         
alignedSubgenusId           
alignedSubgenusAuthorship   
alignedSpeciesName          
alignedSpeciesId            
alignedSpeciesAuthorship    
alignedSubspeciesName       
alignedSubspeciesId         
alignedSubspeciesAuthorship 
alignedPath                 Animalia | Bilateria | Protostomia | Ecdysozoa | Arthropoda | Hexapoda | Insecta | Pterygota | Neoptera | Holometabola | Coleoptera | Polyphaga | Scarabeiformia | Scarabaeoidea | Scarabaeidae | Rutelinae | Adoretini | Adoretus
alignedPathIds              ITIS:202423 | ITIS:914154 | ITIS:914155 | ITIS:914158 | ITIS:82696 | ITIS:563886 | ITIS:99208 | ITIS:100500 | ITIS:563890 | ITIS:914213 | ITIS:109216 | ITIS:112747 | ITIS:678302 | ITIS:114486 | ITIS:114493 | ITIS:678509 | ITIS:926256 | ITIS:187484
alignedPathNames            kingdom | subkingdom | infrakingdom | superphylum | phylum | subphylum | class | subclass | infraclass | superorder | order | suborder | infraorder | superfamily | family | subfamily | tribe | genus
alignedPathAuthorships      |  |  |  |  |  |  |  |  |  | Linnaeus, 1758 | Emery, 1886 | Crowson, 1960 | Latreille, 1802 | Latreille, 1802 | MacLeay, 1819 | Burmeister, 1844 | Dejean, 1833
alignedNameSource           itis
alignedNameSourceUrl        https://linker.bio,https://zenodo.org/records/10810821/files,https://zenodo.org/records/10045382/files,https://zenodo.org/records/10037817/files,https://zenodo.org/records/8327611/files
alignedNameSourceAccessedAt hash://sha256/d2903d0384a8b8193819b8061c8c4e6fec8cc2f7fe72dc0e91c90c07ba2fe15e

Big thanks to GBIF, ITIS, World of Flora Online, Catalogue of Life, DiscoverLife, Paleobiology Database, Mammal Diversity Database and NCBI Taxonomy and many other projects for making their taxonomic resource available for re-use online in bulk. With these comprehensive taxonomic structured resources, (fast!) tools like GitHub - globalbioticinteractions/nomer: maps identifiers and names to other identifiers and names, GitHub - ropensci/taxadb: 📦 Taxonomic Database [2], Global Names Resolver [3] can be developed independent of the specific tools and services provided by the taxonomic projects themselves.

Hope this helps provide some perspective,
-jorrit

References

[1] Poelen, J. H. (ed . ) . (2024). Nomer Corpus of Taxonomic Resources hash://sha256/83617875e84bb8ae7ac2a257ad50eb8e82d8935d975f465b8ee8f3a803f72b48 hash://md5/c639d7e3fcd5603f6c48e9d5e6c49672 (0.24) [Data set]. Zenodo. Nomer Corpus of Taxonomic Resources hash://sha256/83617875e84bb8ae7ac2a257ad50eb8e82d8935d975f465b8ee8f3a803f72b48 hash://md5/c639d7e3fcd5603f6c48e9d5e6c49672

[2] Norman KEA, Chamberlain S, Boettiger C. taxadb: A high-performance local taxonomic database interface. Methods Ecol Evol. 2020; 11: 1153–1159. https://doi.org/10.1111/2041-210X.13440

[3] Dmitry Mozzherin. (2023). gnames/gnverifier: v1.1.5 (v1.1.5). Zenodo. gnames/gnverifier: v1.1.5 . See also https://globalnames.org .

3 Likes

cool, this is really helpful for all sorts of quantitative research on these datasets. thanks for taking the time to elaborate, much appreciated!

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.