BIOMATHEMATICS & APPLIED STATISTICS

Rationale

With the development of increasingly accelerated technology, mathematics are considered nowadays as a complex and abstract science with little use. Yet mathematics, through the development of theories and methodological tools are unquestionably those who have the most advanced human knowledge of our environment. The development of practical applications of mathematics in different scientific fields should help in renewing interest in mathematics. Biomathematics can be viewed as the combination of two sciences: biology and mathematics and are interested in applications of mathematics in the field of biology. The research unit on biomathematics and applied statistics falls into this perspective. This unit is interested not only in the use of mathematical theories in biology but especially publishing scientific notes describing the application of different mathematical tools in life sciences.

Foci

Biostatistics
quantitative genetics
agricultural and health econometrics
bioinformatics.

Topics and Abstracts of Research Projects by PhD Students in Biometry at FSA

1.Leul Mekonnen (2023-2025). Modelling of Cholera Epidemics in Ethiopia.

Abstract: Cholera is an epidemic throughout developing countries like Africa, Asia, the Middle East, South and Central America, and the Caribbean and can be extensive due to poor sanitation and using uncleaned water [Beryl et al., 2016; Harris et al., 2012]. The disease has resulted in several outbreaks spanning the continents of Africa, America, and Asia, including the recent devastating outbreaks in Zimbabwe (2008-2009), Haiti (2010-2012), as well as the one in Yemen (2016-2020), which is the largest documented cholera outbreak in history [Camach et al., 2018]. In 2015, five African countries accounted for 80% of cases of cholera [WHO, 2016a]. Cholera is now endemic in Africa, where at least 20 countries report outbreaks every year, of which Sub-Saharan Africa accounted for 72% of cholera deaths reported worldwide in 2015, with a recorded highest case fatality rate of 1.3% [WHO, 2016a]. This research aims to answer the following main research questions: (a) What are the different modelling techniques available in the literature in cholera epidemiology studies? (b) How do climate factors explain the dynamic of the cholera epidemic in climatically distinct regions? (c) How can vaccination interventions curtail the cholera epidemic under a stratified population? (d) How do different control measures curb the spread of cholera? and (e) What is the role of houseflies/vectors in the transmission of Cholera? This dissertation aims to understand the course of the cholera epidemics in Ethiopia by paying attention to the different factors that sustain the epidemics through the means of deterministic and stochastic models. Specifically, the work will meant to (i) Review the existing literature published using different modelling techniques in cholera epidemiology, (ii) Develop cholera models to under-study the impact of climate variables temperature and rainfall over epidemic regions in Ethiopia, (iii) Develop and validate a mathematical model on the potential impact of vaccination under stratified population, (iv) Investigate the impact of possible types of interventions available that may curb the spread of cholera, v. Develop and validate the houseflies-cholera model that includes both the houseflies and human population.

Keywords: Infectious disease, mathematical modeling, vectors, interventions, vaccination.

2.Kassifou Traore (2023-2025). A mathematical model for analyzing malaria incidence and mortality in sub-Saharan Africa: Accounting for population opinion of modern and traditional treatments and prevention methods.

Abstract: Malaria incidence and mortality have been significantly reduced in recent years due to improved treatment and prevention strategies. Despite this reduction, malaria still constitutes one of the deadliest diseases in the world. Mathematical models have been useful in understanding malaria transmission. Several studies have evaluated the impact of treatment and preventive methods on malaria transmission, but few have taken into account population attitudes towards these treatments and preventive methods, in particular the vaccination, which is recommended by WHO in 2021 and starts in 2023. To address this gap, this doctoral research project develops mathematical models that integrate the dynamics of malaria with the dynamics of opinion in order to quantify the influence of public opinion and behavior towards malaria prevention and treatment approaches on malaria transmission and burden. This research will enhance the area of infectious disease model- ing by identifying key factors for drastically lowering malaria transmission and burden to eliminate malaria globally.

Keywords: Mathematical models, malaria transmission, Malaria prevention, Malaria treatment, Dynamics of opinion, Vaccination, Malaria burden.

3.Mathilde Adeoti (2022-2024). Bayesian nonlinear modelling for correlated epidemic data using flexible distributions: A case study with COVID-19 data, Université d’Abomey-Calavi.

Abstract: Due to several inherent features like Similarly-shaped profiles with different decay patterns, Unexplained variation among repeated measurements within each country, Skewness, outliers, or skew heavy-tailed noises possibly embodied within response variables, the analysis of complex infectious diseases longitudinal data continues to be challenging in epidemiology. Studying appropriate approaches to model and predict the spreading of infectious diseases at different geographical resolutions and levels of detail while taking into account the related enormous heterogeneity becomes more important. However, the nonlinear mixed model (NLMM) is proposed to be a very useful tool to appropriately model and analyse repeated measures or clustered outcomes considered in a variety of application fields. Therefore, throughout this thesis, we are trying to propose and evaluate a new approach for robust modelling of correlated non-homogeneous epidemic data based on an NLME model and under the Bayesian framework. Specifically, the model designed in this study will be implemented using COVID-19 data of West-Africa (WA) population. This study will help have a control on such pan- demic considering existing heterogeneity (between countries) and within countries correlation. This latest consideration is so important as the pandemic spread is not the same (not homogeneous) from one country to another over the countries of the region. Also, the proposed approaches will serve as reference tools for all infectious diseases and, as possible, in other study areas, demonstrating the usefulness of the work.

Keywords: Bayesian nonlinear mixed model; semi-nonparametric distributions; scale mixture of skew-normal distributions; infectious diseases modelling; heterogeneous data; pandemic dynamic.

4.Idelphonse Sode (2024-2026). Bayesian spatial fusion modeling framework for joint analysis of multi-source geospatial data: ecological and epidemiological applications

Abstract: The availability of spatial data has increased dramatically in recent years in many research fields, including epidemiology, ecology, environmental science, remote sensing, and economics motivating the use of spatial modeling approaches. The three types of spatial data (geostatistical, areal, or point pattern data) usually come from multiple sources. They motivate the use of spatial fusion modeling framework for their joint analysis to improve parameter estimation and prediction accuracy. Even though some studies have explored this research area in recent years, there are still some unresolved methodological issues regarding potential confounding factors including sampling bias and dependence between the conditional distribution of different target outcomes. In this PhD research, we propose to extend the spatial fusion modeling framework to account for these confounding factors. In the first objective, we will analyze real datasets from multiple sources using the existing spatial modeling approaches. In the second objective, we will propose a spatial fusion model accounting for the dependence between the target outcomes for analyzing misaligned data using shared components modeling approaches. In the third objective, we will propose a spatial fusion modeling framework that accounts for both dependence structure among spatial processes and preferential sampling. Models will be implemented using the Bayesian inference based on the Integrated Nested Laplace Approximations (INLA) and Stochastic Partial Differential Equations (SPDE) techniques. The performance of our methodologies will be evaluated via extensive simulations under various scenarios of joint dependence and preferential sampling. The last research objective will be to apply the novel methodologies to real-world data collected in ecological and epidemiological settings in West Africa.

Keywords: Spatial fusion, joint spatial models, Bayesian inference, SPDE, misaligned data.

5.Luc Zinzinhedo (Beninese) (2023-2025). Combined approach of Machine Learning and spatiotemporal models for cassava (Manihot esculenta C.) yield prediction in Benin under pathogen infestation conditions

Abstract: As the world’s population is growing, crop yields should increase accordingly to satisfy nutritional needs and avoid hunger. However, crop yields are driven by climate, soil, and environmental parameters, which are now complex to understand. Therefore, precision agriculture is set up to enhance crop production with minimum environmental damage and efficient management of soil fertility. It’s empowered by artificial intelligence, which offers many scientific and technological tools to manage the cropping system. Machine learning, in particular, is applied in agronomy to facilitate crop system understanding. Recently, many machine learning-based models have been built as tools to predict crop yields. Moreover, in the yield prediction framework, modellers do not account for crop pathogens’ effects. Nevertheless, pathogens (e.g.: bacteria, viruses…) are disease-causing agents that dangerously contribute to yield loss and should be considered in mathematical models to improve decision-making. Cassava Mosaic Begomoviruses (genus Begomovirus, family Geminiviridae) is a particular pathogen transmitted by whitefly (Bemisia tabaci) and is the cause of Cassava Mosaic Disease (CMD) in cassava farms. Intending to achieve better decision-making in agriculture, the general objective of this study is to develop a combination of a machine learning-based model and the spatiotemporal dynamic model of CMD for better prediction of cassava yield in Benin. At the end of this study, we expect to provide scientists with a method to account for pathogens in predicting crop yield. To this end, this study will permit to develop: (i) a machine learning-base model for cassava yield prediction in Benin, (ii) a spatiotemporal dynamic model of CMD and estimate the cassava yield loss due to it in Benin, and (iii) a combination of machine learning-based model and spatiotemporal model for cassava yield prediction in Benin.

Keywords: Precision Agriculture, tubers and roots, artificial intelligence, spatiotemporal model, hybrid model, feature selection, feature extraction.

6.Odounfa Mireille (2023-2025). Implication of different deep learning methods in the prediction and management of stresses and pests affecting vegetable crops

Abstract: In recent years, the introduction of deep learning, an artificial intelligence (AI) technique capable of learning from large amounts of data, has shown promising results in predicting and managing stress factors in agriculture. When plants are affected by diseases, they exhibit visual signs such as colored spots, varying in shape and size depending on the type of disease, as well as visible lines on stems and other parts of the plant. These signs evolve in color, shape, and size as the disease progresses. By utilizing image processing techniques, we can identify colored objects and determine the severity of plant diseases and pests. Deep learning methods employ artificial neural networks to extract complex features and relationships from data, aiding in predicting future events and identifying long-term solutions for crop-related issues. This literature review aims to explore the various deep-learning methods involved in predicting and managing stress and pests in vegetable crops. We will examine recent studies in this field, focusing on the latest deep-learning models used to predict stressors and the presence of pests. The goal is to enable farmers to implement better and more effective management strategies. Additionally, we will discuss the advantages and limitations of these approaches, as well as their practical applicability in the context of sustainable agriculture. The study included a total of 69 articles selected based on PRISMA guidelines. Analysis of these articles revealed that tomatoes are the most extensively studied market garden crop. Interestingly, none of the 69 documents explicitly considered climate change in the modeling of stress phenomena. In terms of classification tasks,

CNN models were found to be widely used, accounting for 19.2% of the models employed. Classic models such as InceptionV3 (6.1%) and MobileNetV2 (3.5%) were also commonly utilized. For object detection tasks, Fast R-CNN occurred 6 times, followed by YoloV5 (3 occurrences) and YoloV3 (2 occurrences). Regarding segmentation tasks, the Mask R-CNN accounted for 28.67% of the models, while DeepLabV3+ accounted for 24.98%. The data used to develop these models primarily originated from the field, constituting 53% of the dataset, followed by data collected online. It is important to note that each plant disease has its own unique criterion for assessing severity, and the presence of multiple diseases and various crop types further complicates the estimation of disease severity in vegetable crops. Therefore, the development of a standardized method is necessary to address this complexity.

Keywords: Convolutional neural networks, plant diseases, crop pests, deep learning approaches, systematic literature review.

7.Paulette Guedezoumè (2023-2025). Developing machine learning-based approaches to predict soil fertility in Benin. (Co-supervision with Prof. Dagbenonbakin, INRAB, Benin).

Abstract

Soils are very decisive in the development of plants. They allow infiltration and storage of water, store nutrients for plants and provide a healthy and aerated root environment. Soil is an

An essential part of successful agriculture and is the source of the nutrients we use to grow crops. There are different soil types, and each soil’s properties are different. On these other properties, several types of crops can be grown. The soil fertility evaluation is done by considering soil samples analyzed in the laboratory. These techniques are time-consuming and require many chemical inputs. This study aims to adapt existing machine learning algorithms to predict the essential elements of soil fertility for different types of soil in Benin (organic matter, potassium, phosphorus and nitrogen). The specific objectives are: (i) conduct a critical review on the use of artificial intelligence algorithms to assess soil fertility: diversity, performance and limits; (ii) Evaluate the performance of machine learning algorithms to predict soil properties; (iii) Development of machine learning technique to predict soil properties in Benin; (iv) Predict soil suitability for major crops in Benin. To this end, soil samples will be taken from major types of soil encountered in Benin to compare several automatic learning algorithms. The dataset will be divided into two samples: a training sample consisting of 70% of the considered dataset and a validation sample comprising 30% of the data. Validation criteria such as root mean square error (RMSE), mean absolute error (MAE) and coefficient of determination (R2) will be considered.

Keywords: soil fertility; machine learning algorithms; predictive models; agriculture; Benin.

8.Romuald Beh Mba (2022-2024). Empirical performance of Generalized linear mixed model for point-referenced spatial data with Application to epidemiological data: Disease Mapping.

Abstract. Disease mapping concerns the analysis of the spatial distribution of disease. Usually, its focus is on the statistical modeling of disease outcomes when inference about disease risk is required. To this end, it can be considered that there are four main areas of focus such as relative risk estimation, disease clustering, ecological analysis, and surveillance. The most developed part of the field is relative risk estimation, and a large range of models and associated software are now available. Variogram models need to be further developed to take into account the above property, which is known as non-stationarity. So, understanding infectious disease patterns (i.e. space-time variations and/or changes) has always been a challenging affair. Disease diffusion can vary significantly from place to place and from time to time for several reasons, including heterogeneity of the hosts and pathogens, physical and social environments, and interactions across space and time. Moreover, uncertainties linked to population movement and records of infected individuals can increase the difficulty of understanding the spatiotemporal spread of an infectious disease. A number of key studies have shown that infectious disease spread depends significantly upon the spatial features of a population whereas major benefits of spatial disease modeling include the assessment of disease intervention and control strategies (e.g., border control and quarantine). The objectives are to (i) assess the empirical models for non-stationary, geostatistical epidemiological data (ii) use the empirical geostatistical model for misaligned data to assess the impact of site-specific

endemicity on human infection at different sets of locations, (iii) assess the empirical numerical algorithms to improve the computation of geostatistical models using MCMC.

Keywords: Infectious disease, statistical approach, mapping, diffusion models, interventions.

9.Yvette Montcho (2022-2024). Modeling the impacts of non-pharmaceutical interventions on COVID-19 dynamics using mixed regression and Generalized linear mixed models: case study of Africa. Université d’Abomey-Calavi.

Abstract: Over the past decade, there has been renewed public and official concern about infectious diseases as a major public health threat. Indeed, the situation has arisen against a background of some surprise (McMichael, 2004). Emerging and re-emerging Infectious Diseases (EIDs) are diseases that have recently increased in incidence or in geographic or host range (e.g., tuberculosis, cholera, malaria, dengue fever, Japanese encephalitis, West Nile fever, and yellow fever), diseases caused by new variants assigned to known pathogens (e.g., HIV, new strains of influenza virus, SARS, drug-resistant strains of bacteria, Nipah virus, Ebola virus, hantavirus pulmonary syndrome and avian influenza virus), and bacteria newly resistant to antibiotics, notably the multiple resistant strains that render the armamentarium of antibiotics useless (Smolinski et al., 2003). COVID-19, caused by a novel strain of coronavirus (SARS-COV-2), is the only one that has reached a pandemic level with significant ravages globally. Although, the relatively low incidence of the pandemic is noticed in Africa, it does hide many disparities which present both spatial and socio-ecological determinants that require a proper investigation. This research aims to answer the following main research questions: (a) What are the key factors which explain spatial heterogeneity in the incidence of COVID-19 in Africa? (b) How do non-pharmaceutical interventions explain the dynamic of COVID-19 in African countries? (c) Will an imperfect vaccine curtail the COVID-19 pandemic in Africa? (d) Will the COVID-19 pandemic become seasonal in African countries? and (e) What are suitable models for understanding the spread of COVID-19 across African countries?

Keywords: Pandemic, deterministic model, spatial regression, interventions, vaccination.

10.Ariane Houetohossou (2021-2024). Architectural and parametric optimization of pre-trained Deep Convolutional Neural Network (DCNN): Stress detection on tomato plants under climate and infection-based simulated environments. Université d’Abomey-Calavi.

Abstract: Artificial Intelligence (AI) is the reproduction of intelligence in machines that are programmed to imitate intelligent actions. It is a vast field which brings together several subjects including Natural Language Processing, robotics, and Machine Learning (ML). ML is a field of study that gives computers the ability to learn without being explicitly programmed (Samuel, 1959). pressure on agricultural systems will also increase, while the amount of farmland in the world is limited. There is a need to find sustainable agricultural practices that use least farmland and optimize crop yield. The most significant concern in agriculture is stress control. Plants’ stress can be biotic or abiotic. Biotic stress in plants can be caused by living organisms like nematodes, fungi, bacteria, viruses, insects, arachnids and weeds. Whereas abiotic stress is caused by low or high temperature, little or immoderate water, high salinity and soil components. It was found that the yield loss in agriculture is almost 40 % due to a lack of field monitoring and non-identified disease (Paul et al., 2020). The research project is related to the potential of artificial intelligence in agricultural sciences. By running this research project, we are looking forward to getting acceptable answers to the following questions: (1) what are the relevant deep-learning techniques used in agriculture to detect stress on fruits and vegetables? (2) what are the best associations between climatic parameters for high yield prediction of tomato plants? (3) what are the best architecture and the best parameters of the successful DL solution in the detection of disease-based stress of tomato plants?

Keywords: Deep Learning, prediction, fruits, vegetables, stress, Agricultural yield.

11.Peace Souand Tahi (2021-2024). Optimization of machine learning techniques’ performances in predicting the yield of maize cultures under several controlled weather and fertilization patterns. Université d’Abomey-Calavi.

Abstract: Machine learning (ML) is a decision support tool increasingly used in many fields for classification, regression, clustering problems, detection of objects, or pattern recognition. In agriculture, it is essential for early disease detection and yield prediction. This thesis project will use maize plants (Zea mays) as an application and make a bidirectional analysis. It will first be focused on the weather and fertilization pattern mining using growth and yield parameters and then on yield prediction from weather and fertilization scenarios. The choice of this crop is justified by the fact that maize is an important cereal crop in the diet of the people in the world, particularly in West Africa (Nkurunziza et al., 2019). This research project addresses our lack of knowledge of the potential of machine learning in agricultural science. By running this research project, we are looking forward to getting acceptable answers to the following questions: 1- How can the best weather characteristics for growing corn be determined? 2-How the weather and the fertilization scenarios affect the growth parameters and the yield of maize cultures? 3- How better are machine learning methods for the purpose of predicting maize yield? 4- How could we adjust the overall machine learning parameters to optimize their performance measures?

Keywords: Deep Learning, prediction, maize, stress, Agricultural yield.

12.Arsène Mushagalusa Ciza (2020-2024). Random forest regression for count response data and disease vector abundance prediction: application to tick (Rhipicephalus appendiculatus) abundance in grazed permanent pastures. Université d’Abomey-Calavi.

Abstract: Over the last twenty years, there has been a growing interest in modelling count data, particularly in ecological studies of species abundance. Understanding species abundance distributions and their drivers is crucial for effective biodiversity conservation and ecological management. Traditional count data models, such as Poisson regression and its extensions, have limitations in capturing complex non-linear interactions and are not suitable for small- large-p problems. Thus, new modelling approaches are needed to improve models’ predictive performance. Machine learning (ML) methods, in particular Random Forests (RF), have gained popularity in various applied fields. RF has been shown to be more accurate than traditional linear methods and other ML techniques, such as neural networks and support vector machines in modelling species distributions. RF’s robustness to noise, ability to provide variable measures of importance, handling of higher-order interactions, and ease of use make it an ideal choice for improving predictive performance and model interpretation. Despite their advantages, RF models are often used as black boxes, and the reasons for their effectiveness and the assumptions on which they are based remain unclear. In addition, most RF studies focus on classification rather than regression tasks, and ML methods for regression are typically designed for continuous data, making their performance with count data less understood. Overdispersion is a significant challenge in count data analysis, particularly in ecology, due to spatial autocorrelation resulting from spatio-temporal processes, missing information and sampling bias. This problem, together with measurement error, affects the reliability and interpretation of predictive models. This thesis aimed to evaluate the effectiveness and efficiency of RFs in modelling count data, with a particular focus on species abundance distributions. To achieve this goal, we investigated the impact of data characteristics and overdispersion on the performance and optimization of RF parameters in count data modelling. We assessed the performance of different resampling strategies to evaluate RF models. Additionally, we explored the influence of spatial autocorrelation and species features on RF predictive performance and spatial cross-validation methods. Finally, we evaluated the predictive performance of spatial RF variants in modelling species abundance distributions. Keywords: Machine learning, Species distribution modelling, Resampling, Random forest, Overdispersion, discrete data, spatial-autocorrelation.

13.Yannick Mugumaarhahama (2020-2024). Spatial point process model for analysis of presence-only data: accounting for species characteristics and uncertainties in data. Université d’Abomey-Calavi.

Abstract: Ecologists seek models that accurately describe and predict phenomena of interest to support decision-making processes in the context of biodiversity conservation. Species distribution modelling (SDM) is an essential component of many ecological applications, particularly in the context of climate change and increasing anthropogenic pressures on natural resources, particularly forests. A plethora of algorithms have been proposed in SDM, with the inhomogeneous Poisson point process (IPP) model and its extensions representing the most recent developments. Achieving accuracy imposes the use of the most appropriate modelling algorithms with respect to the type and quality of data at hand. The quality of data in SDM is particularly problematic due to the scarcity of structured, high-quality data, which requires a significant investment of funds, time, and human resources. These resources are frequently lacking among ecologists, particularly in Africa, where national budgets for scientific research and conservation are extremely limited. The majority of data available for most species and geographic regions are presence-only (PO) data, which are collected opportunistically without adhering to rigorous protocols designed to minimize bias and uncertainty. Obtaining accurate SDM outputs using PO data is a significant challenge due to the inherent biases and uncertainties associated with this type of data. Presence-only data are prone to sampling bias, imperfect detection, and positional uncertainty. Moreover, species characteristics and the use of spatially truncated and/or spatially autocorrelated PO data could affect the performance of SDMs, particularly the IPP model and its extensions. This thesis assesses and demonstrates the effects of biases and uncertainties in PO data on the reliability of results of the IPP model and its extensions if they are not accounted for. Additionally, the influence of species characteristics on the quality of model results is assessed and demonstrated. Furthermore, it examines the effects of spatial autocorrelation on the reliability of these models if it is ignored, despite its presence in species occurrences.

Keywords: Species distribution modelling, Data integration, Maximum likelihood estimates, Data quality, Presence-only data, Structured data, Detection errors, Positional uncertainty, Niche truncation, Poisson point process, Species specialization, Spatial dependence, Data simulation, Biodiversity conservation, Red-bellied guenon.

14.Armel Bourobou Bourobou (2020-2024). Inhomogeneous Poisson Process and its extensions for species distribution analysis: accounting for sampling bias, non-linear effect and spatial dependence using classical and approximate Bayesian inference.

Abstract: Species distribution modeling (SDM) is an empirical method widely used to explain and predict species ranges and environmental niches, commonly constructed by inferring species’ occurrence–environment features relationships via statistical and machine-learning methods (Merow et al. 2014; Zimmermann et al. 2010). There is a broad range of SDM approaches available to explore the correlation between response and predictor variables (Guisan and Zimmermann, 2000; Franklin, 2009). These approaches vary based on the data they are designed to fit and the questions they are trying to answer. Some methods require information about species presences and absences (PA or SO data), while others only require information about presences or presence plus a certain knowledge about the covariates in background locations (PO, PB data; Koshkina 2019). It has also been shown that the quality and type of prediction resulting from an SDM is largely dependent on the quality of the data available for modelling (i.e., the area that was surveyed, the number of repeated surveys, the species characteristics, and the spatial scale (Pearson et al. 2006; Wisz et al. 2008). The aim of this research project is to explore and test various methodological approaches that can help assess and account for uncertainties linked to the analysis of species distribution data in order to improve their predictive accuracy. Specifically, we aim to (i) Review the Inhomogeneous Poisson Process and its Extensions for species distribution analysis; (ii) Evaluate the empirical performance of main spatial point models with respect to some confounding factors such as sampling bias, non-linear effect of covariates on the log-intensity spatial dependency using Classical Bayesian and Integrated Nested Laplace Approximations; (iii) Evaluate the Effects of Sampling Bias and Model Complexity on the Predictive Performance of Inhomogeneous Poisson Process models and its Extensions using Classical Bayesian and Integrated Nested Laplace Approximations; (iv) Model analyze potential geographical distribution of some key plant species from Gabon using Global Biodiversity Information Facility (GBIF) and AfricaClim databases.