Take a look at the projects selected for B-Cubed’s hackathon

21 December 2023

Eleven innovative projects have been selected for B-Cubed’s hackathon, which will take place in Brussels this April. Participants will be able to join any project they would like to contribute to and develop the idea with the rest of the team. In order to make an informed decision before the registration opens, take a look at all accepted projects and their goals: 

Project 1: Data cubes for landscape approach in biodiversity conservation across farmers

Summary: In the Netherlands, collective contracting of farmers to make biodiversity and habitat conservation on a landscape level more ecologically efficient, has been implemented since 2014. Multidimensional data cubes can be a great vehicle to integrate data across farmlands from individual data contributors and eventually monitor conservation efforts on a landscape level. A landscape-level approach is crucial for conservation of biodiversity and habitats as farmland birds and other species don’t adhere to the individual farmland level, hence their conservation requires cooperation across farmers. 

Needed skills: Python, EBVs, biodiversity monitoring experience in agricultural setting, handling and processing ecological long tail data, species distributions.

Project 2: Unveiling Ecological Dynamics Through Simulation and Visualization of Biodiversity Data Cubes

Summary: Simulation studies offer numerous benefits due to their ability to mimic real-world scenarios in controlled and customizable environments. Ecosystems and biodiversity data are very complex and involve a multitude of interacting factors. Simulations allow researchers to model and understand the complexity of ecological systems by varying parameters such as spatial and/or temporal clustering, species prevalence, etc.

In this project, we aim to create a simulation framework for biodiversity data cubes using the R programming language. This can start from simulating multiple species distributed in a landscape over a temporal scope. In a second phase, the simulation of a variety of observation processes and effort can generate actual occurrence datasets. Based on their (simulated) spatial uncertainty, occurrences can then be designated to a grid to form a data cube. Nevertheless, we encourage you to think out of the box and provide new ideas as well. For the occurrence-to-cube designation, R code is already available.

The simulation framework can be used to assess multiple research questions under different parameter settings, such as the effect of clustering on occurrence-to-grid designation and the effect of different patterns of missingness on data quality and indicator robustness. Simulation studies can incorporate scenarios with missing data, enabling researchers to assess the impact of data gaps on analyses. Understanding how missing data influences results is crucial for improving data collection strategies and addressing potential biases.

With this, the secondary objective of the simulation study is to develop a visualisation tool for the simulated cubes. This tool aims to enhance the understanding of data clustering and missingness within the simulated environment. By creating a visual representation, researchers can effectively analyse and interpret patterns of clustered data as well as identify areas where data is missing. This visualization capability contributes to a more comprehensive exploration of the simulated scenarios, allowing for deeper insights into the behaviour of data within the context of the study.

Skills needed: R programming, algorithm development, creative thinking, problem solving, efficient coding, collaboration (Git).

Project 3: Intelligent Nature Positive Impact Data for the Financial Services

Summary: The challenge: For the finance and insurance sectors to benefit from opportunities provided by nature and mitigate nature-related risks, reliable biodiversity monitoring data is a must. Current state-of-the-art: To serve the need for simple, decision-ready metrics for biodiversity footprinting and the measurement of Nature Positive Impact, NatureMetrics has developed a core functionality to be delivered through NatureMetrics digital dashboard. To encompass the many dimensions of nature and service the Financial Services market, this framework combines biodiversity footprint, biodiversity impact and biodiversity trends metrics. Thus, this framework concomitantly measures biodiversity pressures and responses. Here, we propose the incorporation of standardized biodiversity data from B-Cube within iNPI, e.g., during design of open-source connectivity algorithms at high spatial resolution to link with the framework’s restoration tracker element, to maximize the efficiency and return of investment of biodiversity management interventions, and demonstrate the high-resolution required for the Financial Services market. 

Needed skills: Data scientists, bioinformaticians, statistical ecologists, mathematical modelling, machine learning.

Project 4: Irokube: Deep-SDM and Critical Habitats Mapping Empowered by Data Cubes 

Summary: Species Distribution Models (SDMs) play a key role in conservation, offering a predictive framework to assess and proactively address changes in species distributions. By anticipating environmental shifts, SDMs enable targeted conservation strategies, crucial for effective biodiversity preservation.

This project explores the added value of biodiversity data cubes to identify critical habitats using Deep-Learning Species Distribution Models (Deep-SDMs). The conventional Deep-SDM approach often relies on observation point data, which may struggle to capture the dynamic temporal evolution patterns of species distributions. Furthermore, training Deep-SDM with multiple observation sources implies manually compiling and ordering observations beforehand, which is time-consuming.

The integration of biodiversity data cubes with environmental and remote sensing data will be performed to improve Deep-SDMs and thus better describe habitats. To illustrate this approach, we will explore a use case based on floristic data from France and Belgium. Using detailed floristic data from these two countries, we will demonstrate how the integration of data cubes can contribute to the modelling of species distribution over an extended geographical scale, taking into account environmental and remote sensing data. This case study will serve to highlight the potential benefits of using data cubes in biodiversity modelling.

Improved species distribution modelling using biodiversity data cubes could offer a significant contribution to public policy, providing essential information for conservation and environmental management decision-making. These models make it possible to identify crucial areas for biodiversity, facilitating the design of targeted conservation policies. By anticipating the potential impacts of environmental changes, such as climate change, public policies can be proactively adapted to mitigate threats to flora and fauna. 

Needed skills: Knowledge of GBIF data and data cube extraction, data-viz and machine/deep learning algorithms.

Project 5: Tackling Ocean Biodiversity Crisis with Marine Data B-Cubes

Summary: The introduction of non-indigenous species is one of the major drivers of ocean biodiversity loss with an exponential growing rate yearly (IPBES 2019; UN World Ocean Assessment II, 2021). Agile tools for assessing potential risk of non-indigenous species are key to identifying native species and ecosystems at risk, which can be controlled with early detection and rapid mitigation responses (UN World Ocean Assessment II, 2021). Data-driven assessments such as Species Distribution Models (SDM) make use of FAIR biogeographic and environmental data. However, these typically come from heterogeneous sources and use different standards. Wrangling into a modelling-ready dataset is a time-consuming task that lags the integration into machine learning and modelling workflows. To boost this stage of modelling pipelines, finally enabling faster assessment of species invasion risks, we aim to bring model-ready data closer to SDM pipelines. We will funnel relevant biodiversity databases such as OBIS, GBIF or ETN, and environmental data sources such as Copernicus Marine Service or Bio-Oracle, into a Spatial Temporal Asset Catalog (STAC). This OGC-compliant specification aligns with the GBIF API. The STAC back-end will serve as an open-access entry point of ecological modelling-ready data cubes following the B-Cubes software specification. These B-Cubes will be compatible for plugging into machine learning and modelling workflows. We will showcase its potential for early risk assessment of non-indigenous species invasions with SDMs to calculate the probability of potential invasive species to colonise an area. We will build upon existing tools and develop new ones using well-known data science programming languages such as python or R, allowing reuse for other marine scientists, policy-makers and the wider community.

Needed skills: Policy and communication.

Project 6: Phenological Diversity trends by remote sensing-related datacubes

Summary: Spectral diversity is a proxy for vegetation diversity. But is also a measure of trait diversity of the plant community and their geological surroundings. Taking a Critical Zone perspective, it describes the trait expression of the plant community taking in account the taxonomic composition and the abiotic factors linked to the availability of nutrients and water. To best emphasize this perspective of trait response of the plant community to change in abiotic factor, it is important to include the temporal dimension and take a phenological stance. 

The R package rasterdiv allows the implementation of spectral diversity analysis over satellite estimates of vegetation pigments (NDVI, MCARI, ...), in particular with Rao'Q , a diversity index that takes in account both species abundances and their trait distances. The package allows to deal with multidimensional traits, assuming that they are uncorrelated which is not the case for vegetation index time series. So, we propose to implement a dedicated method in rasterdiv to handle time series to extract phenological diversity. We plan two different alternative methods. 

  1. Dynamic Time Warping (DTW) of time series trajectory. Rao Q requires identifying a distance between the traits. The output metrics is the sum of the residual difference and the penalty of the time warping change. DWT has an exact slow algorithm and a fast one.
  2. Summary Statistics approach. Time series are summarized using phenological statistics. Different options are possible.  The easier solution is to use HR-VPP available on WEKEO.

Use cases: we plan to work on three small (⋍30 hectares) grassland areas along an elevation gradient, where in situ biodiversity information is available. More in detail: 1) Gran Paradiso National Park at 2200m; 2) Sila National Park at 1700m; 3) Alta Murgia grasslands at 500m.

Needed skills: DTW libraries, Phenology Data, algorithms, and resources to obtain them; Knowledge of the rasterdiv package; Scripting capacity in R/python; Knowledge of the use case data; Knowledge of community ecology to correctly interpret the results.

Project 7: Interoperable eLTER Standard Observation variables for Biosphere

Summary: The project is part of a doctoral thesis entitled “Improving Semantic Interoperability of observational research through data FAIRification”. As part of our work for eLTER PLUS (GA.Nr. 871128) we aim to represent eLTER Standard Observations (SOs) in a machine interpretable way. Based on eLTER’s whole system approach, the WAILS concept, the SOs will include the minimum set of variables as well as the associated method protocols that can characterize adequately the state and future trends of the Earth systems. We intend to semantically enhance the variable descriptions in EnvThes used in the site and dataset registry DEIMS-SDR with the I-ADOPT Framework endorsed by RDA. I-ADOPT enables semantically precise and machine-readable descriptions of variables decomposing them into atomic components each described with concepts from semantic artefacts. It acts as a semantic broker between different variable representations enabling interoperability across annotated datasets. To capture the methods and protocols for applicable EUNIS habitat types including the automatic SO classification of sites based on their observation capabilities we also develop a supporting observation ontology. For the B-Cubes project we focus on biodiversity variables including flying insects, vegetation composition, occurrence data of birds using acoustic recording observed on the eLTER Site “Zöbelboden”, Austria, data that is also collected for the project LIFEPLAN and the SoilBON initiative. In order to prevent our approach from functioning solely tailored to eLTER requirements, we want to design our approach building on top of existing specifications as the Species occurrence cube and seek for active collaboration in their further development to contribute to FAIR, semantically rich representations. With our involvement in GO FAIR Foundation we will help in the further dissemination of these FAIR data practices published as nanopublications via the FAIR Connect search engine (available in 2024).  

Needed skills: FAIR data management; data modelling; knowledge engineering; semantic modelling; FAIR metadata schema development; Jupyter Notebook & Python.

Project 8: Virtual B3 - How reliability and uncertainty of biodiversity sampling affect SDM: build virtual species to cope with real problems!

Summary: In most cases, there is no complete information about the ‘reality’ of the focal species distribution besides the data collected in-situ. This is partly because the completeness of the data extracted from surveys (recorded in-situ) is difficult to measure. For instance, occurrence data from natural history collections, such as museum or herbaria collections, tend to be very incomplete with a relatively high amount of false absences—i.e., species occurrences missed by the observer in the field in case of a rare or difficult to identify species. Such incompleteness affects our ability to detect the real spatial coverage as well as the niche of a species with the records available. These limitations, in turn, can seriously flaw final results of species distribution models, by distorting the relationship between species occurrences and the underlying environmental patterns. Yet, quantifying sources of error is essential for proper descriptive or mechanistic modelling of species distributions, especially for conservation purposes (e.g. study the distribution of rare or/and endangered species, planning the expansion of protected areas or investigating the factors that can influence the distribution of a species in the future under global climatic change). Under this scenario, making use of simulated or in-silico datasets — the so-called ‘virtual ecologist’ approach — allows to generate distribution data with known ecological characteristics, helping to simulate and thus account for spatio-temporal and taxonomic noises, thanks to the complete control on the configuration of factors constraining the distribution of species. This approach could help to implement modelling techniques, leading to create further recommendations for conservation planning.

Needed skills: Knowledge of R, programming/modelling, and ecological/conservation skills. 

Project 9: Species occurrence cube portability

Summary: Species occurrence cubes will be available in a limited range of formats, which may hinder the reuse of these cubes in the future. This project aims to increase the range of formats that species occurrence cubes will be available and make them as ready-for-use as possible.  

Needed skills: Knowledge of existing conversion tools and/or scripting using libraries using common programming languages such as Python, C, Node, Java etc. The choice of tools or scripting language is open for participants to choose - the more the merrier! 

Project 10: Effects of integrating species occurrences from different sources into Data Cubes: facilitating detection of data balance, biases, scale and other effects

Traditional biodiversity monitoring programmes usually hold well-curated data of species occurrences, with high spatial and temporal resolutions but often only for restricted areas. In recent years, technological developments along with online digital platforms have increased the amount of information available on species occurrences, both from the scientific community as well as from community-based contributions such as citizen science. The increase in data availability and accessibility poses the challenge of integrating such highly heterogenous data sets on species occurrences, but in the context of the implementation of biodiversity policy objectives at wide scales, making the most of all information available is crucial. While integration of species ocurrences from different data sources cannot be discarded, we can expect that variability in sampling methods, spatial and temporal resolution and coverage, as well as data confidence levels, will introduce bias in the assessments. 

Needed skills: Programming, EBVs particularly on species distributions, biodiversity monitoring experience, indicators development and use.

Project 11: GUI for general biodiversity indicators

SummaryBiodiversity indicators provide essential information for policy-makers, but often are only accessible to those with an understanding of coding and the time to learn a new package, namely research scientists. That means many users must rely on ready-made or commissioned analyses. Within the EU-project B-Cubed we are providing flexible workflows to align with policy objectives. But to be truly user-friendly, these should be provided in graphical, interactive format that does not require the user to understanding coding. The goal of this hackathon project is to develop a front end for a general biodiversity indicators package e.g. in the form of a Shiny app.

Needed skills: Shiny and R, ideally with a bit of S3 knowledge.