This press release has been published by GBIF, EurekAlert!, WIT News and AlphaGalileo.
With support from EU-funded B-Cubed project, GBIF introduces new summary format suited to biodiversity modelling and indicators.
GBIF has enabled a simple, easy-to-use service for creating and downloading species occurrence cubes based on GBIF-mediated data. The service enables users to select their geographic, temporal and taxonomic dimensions of interest to generate customised reports summarising occurrences in a format suitable for use in biodiversity models and indicators. Its implementation is part of B-Cubed's Work Package 2, led by the Research Institute for Nature and Forest (INBO).
The new data cube service standardises access to biodiversity data for producing Essential Biodiversity Variables (EBVs) as well as indicator workflows and models for climate and land use change created by B-Cubed partners. Specific case studies will test the capacity of indicator workflows to capture relevant aspects of biodiversity change based on different policy targets.
By providing species occurrence measures across user-defined dimensions and resolutions, cubes significantly expand the usability of GBIF-mediated data. The ability for users to define biodiversity dimensions across time, space and taxa and obtain species occurrence counts at those resolutions will improve integration with other data sources.
For example, species occurrences have an important role in calibrating and validating biodiversity models derived from satellite imagery, but to date, differences in temporal and spatial resolutions have required considerable data processing. By better matching the scale and resolution, data cubes can aid users to build more robust models of species distribution and biodiversity change more efficiently and improve biodiversity baselines and monitor changes over time.
Technical background
An occurrence cube is a tab-separated csv file containing species occurrence measures (e.g. a count) summarized by taxonomic, temporal and/or spatial dimensions (e.g. a given year, a specific taxonomic rank, etc.). A simple example could be a cube summarizing the records of European lagomorph species by year, a query for which the produced cube reduces more than 1.5 million occurrences into a 23 KB summary of just over 1,000 lines, each representing the count of a given species in a given year, e.g.
species | year | occurrences |
---|---|---|
Prolagus sardus | 1830 | 8 |
Anoema oeningensis | 1963 | 1 |
Anoema oeningensis | 1966 | 7 |
Prolagus sardus | 1966 | 4 |
Lepus granatensis | 1975 | 121 |
… | … | … |
As with other GBIF occurrence download formats, all successful queries are assigned unique and permanent DOIs to enable FAIR citation that enable attribution of credit to data publishers and aid reproducibility of downstream analyses.
Based on an extension of the GBIF occurrence download API that allows queries written in Structured Query Language (SQL), the cube download service can be accessed within the GBIF.org interface, where users select focus data using the usual filters for taxonomy, geography and dates, among others. Selecting Cube as the desired download format presents the user with parameters to define the cube's taxonomic, temporal and spatial dimensions. Before initiating the download, the user can also inspect with the SQL query that builds the cube, enabling advanced users to further customize the request.