Bayesian Spatiotemporal Small Area Estimation for Forest Disturbance Monitoring
PhD Opportunity
Forest inventory Survey sampling Small area estimation Bayesian / INLA Remote sensing
Partners & Funding
PEPR FORESTT is a national research initiative funded by France 2030. It brings together a consortium of leading French institutions, including INRAE, Université de Pau et des Pays de l'Adour, AgroParisTech, Ecole d'Ingénieur de Purpan, Aix-Marseille Université, ONF, CIRAD, CNRS, IRD, IGN, and CNPF. This interdisciplinary program aims to advance our understanding, monitoring, and management of forests in the face of climate change and increasing disturbance risks.
Career Prospects
Graduates of this PhD program are equipped to pursue careers as experts in advanced statistical modeling and spatial data analysis. They can contribute to both fundamental and applied research, as well as to decision-making and policy in sectors where advanced statistical modeling and spatial data analysis are critical. This includes:
- Research institutes and academia
- National statistical agencies
- Environmental monitoring agencies
- Private-sector geospatial technology companies and environmental consulting firms
- NGOs and intergovernmental bodies focused on sustainable development, climate change, or global health
Program Structure
As part of the Doctoral School Sirena at Université de Lorraine, this PhD is a 3-year, fully funded research position with employee status. The program combines a strong research focus with light coursework requirements (30 credits total):
- 10 credits from classes, including English (or French for non-French speakers)
- 10 credits from conference participation and scientific publications
- 5 credits from activities preparing for future career paths
For the full description of credit requirements, see the Doctoral School SIReNa’s website.
The primary focus is on producing high-quality scientific publications, which are compiled into a dissertation and defended in a public oral examination.
Who should apply?
A candidate with a strong Master’s in quantitative field (e.g. statistics, data science, applied mathematics, survey sampling, epidemiology, …). You should be comfortable with statistical modeling and programming in R or Python. Experience with survey sampling, spatial statistics, or Bayesian methods is clearly a plus — but motivation and the capacity to learn matter more than a checklist.
Required:
- Master’s degree (or equivalent 5-year diploma) by the start date
- Solid knowledge of statistical modeling
- Programming skills in R or Python
- Fluency in English (written and spoken)
Preferred:
- Experience with survey sampling or spatial statistics
- Experience with hierarchical Bayesian methods
- Interest in environmental or geospatial applications
- Interest in disease mapping (epidemiological analogy)
- French is a plus but not required
Presentation of topic
Why forests — and why now?
Forests cover about 30% of the Earth’s land surface and are central to the carbon cycle, biodiversity, and water regulation. Climate change is making them more fragile: droughts are longer, wildfires more intense, storms more frequent, and insect outbreaks — bark beetles, pine processionary moths — are expanding into new latitudes.
What makes this particularly challenging from a monitoring point of view is that these disturbances are localized and fast-moving. A windstorm might flatten 500 hectares in one valley while leaving the adjacent one untouched. An insect outbreak can sweep through a region in a single season. Traditional large-scale forest surveys are not designed for this kind of spatial and temporal granularity.
Fig. 1 — Forest disturbance from 2025 wildfires in Aude, France (left) and 2019 bark beetle damage in Grand Est, France (right), showing contrasting spatial footprints as detected by Sentinel-2 imagery. Normalized Burn Ratio (NBR) is an index designed to highlight burnt areas in large fire zones.
Left: based on Massey & Vega (2025). Right: generated using the fordead package (de Boissieu et al., 2024).
Images courtesy of the LIF and UMR TETIS.
Remote sensing (satellite imagery, LiDAR) can detect that something happened — a canopy has opened, reflectance has changed. But it cannot directly tell us how much timber volume was affected, which is the key quantity for carbon accounting and forest management decisions. We need direct, consistent measurements of forest attributes across all forested areas, which are costly and time-consuming to collect. Fortunately, most governments provide this type of data via national forest inventories.
CORE TENSION: National forest inventories give direct measurements of forest attributes — but are not designed to operate at fine spatial and temporal resolution. Remote sensing gives us wall-to-wall coverage — but of proxy signals, not volume. This PhD is about bridging that gap with statistics.
What is a forest inventory?
The French National Forest Inventory (NFI), run by IGN, relies on a carefully designed probability sample: each year, around 7,000 circular plots are selected across the territory. At each plot, trained field crews record tree diameters, heights, species composition, basal area, and many other attributes. From this relatively small sample, statisticians produce estimates of forest resources at regional and national scales.
The gold standard for this type of official data production is the design-based approach. In this framework, the forest itself is viewed as a fixed population: every tree has a fixed (though unknown) volume that is measurable without error, and randomness arises only from how we took the sample—not from the trees or measurements themselves. This contrasts with model-based approaches, where the data are assumed to be generated by an underlying stochastic process.
The sample units are points defined by geographic coordinates, not trees. At each sampled point, we measure the trees within a plot and convert their total (e.g., timber volume) into a local density (e.g., m³ per hectare). Conceptually, each point in the forest has an associated density value, whether or not it is observed. Taken together, these values define a continuous local density surface over the territory.
Fig. 2 — Monte Carlo interpretation of a forest inventory with tree height as the target variable. The local density surface \(Y(x)\) (red curve) is defined for every location \(x \in F\) and represents the quantity we wish to integrate. The pink segments represent the regions around each tree where a plot center would include that tree in the measurement.
*In practice, NFIs often use systematic grids, but for intuition we consider independent uniform sampling.
Fig. 2 illustrates this idea: trees induce a local density surface, and sampling plot locations corresponds to drawing random points on that surface. Estimating the total resource is therefore equivalent to integrating this surface over the domain. Because we only observe the density at a finite set of randomly selected locations, this can be interpreted as a form of Monte Carlo integration.
If the total surface area of the forest domain is known, the total resource can be expressed as the area multiplied by the true spatial mean \(\bar{Y}\). This leads naturally to the sample mean estimator \(\hat{\bar{Y}}\), which is unbiased under independent sampling designs and converges to the true value by the Law of Large Numbers. By the Central Limit Theorem, we can construct an asymptotically valid confidence interval using the variance \(\hat V(\hat{\bar{Y}})=S^2/n\).
When sampling is not uniform—as in the French NFI—the same idea applies, but the estimator becomes a weighted mean that accounts for unequal inclusion probabilities.
Auxiliary maps and model-assisted estimation
The sample mean estimator is adequate at large scales but limited when spatial resolution matters. To do better we can incorporate remote sensing from satellite imagery or LiDAR point clouds to build wall-to-wall maps of proxy variables — canopy height models (CHM), spectral indices — that correlate with timber volume. If we consider this auxiliary as predictor in a predictive model, we could generate a predictive density surface that is known everywhere and thus has a known integral. Now we only need to know the volume of the difference between the predictive surface and local density surface so we do Monte Carlo integration on the residual as shown in Fig. 3.
Fig. 3 — Model-assisted estimation using a canopy height model (CHM), denoted \(Z(x)\), to create a predictive density surface with a linear model.
This is called model-assisted estimation which is still design-based because the model was only used to transform the problem to the residuals. It does not even matter if the assisting model is correct — we get improved precision as long as \(V(Y(x))>V(R(x))\).
Model-assisted estimation can improve precision while preserving design-unbiasedness — even if the model is completely wrong.
Why this breaks down for monitoring
Model-assisted estimators rely on asymptotic validity, requiring a sufficient number of samples for their design-based properties to hold. In forest monitoring, this condition is typically met only for most widespread events; for rare or localized disturbances, the framework begins to fracture. Specifically, variance estimates become highly unstable when \(n < 6\), leading to confidence intervals with unreliable coverage rates that fail to represent true uncertainty.
Beyond sample size, the efficiency of these methods is strictly tied to the quality and spatiotemporal alignment of auxiliary data. Precision gains diminish rapidly unless the relationship between ground plots and remote sensing predictors is exceptionally strong (Fig. 4). Maintaining this high level of predictive power is a significant operational challenge, as misalignment or sensor noise can quickly render auxiliary information ineffective in data-sparse regions.
Fig. 4 — Relative efficiency of model-assisted estimators. The curves illustrate the sensitivity of precision gains to the correlation (\(R^2\)) between auxiliary predictors and ground observations. In practice, weak or misaligned auxiliary data leads to a rapid loss of efficiency, necessitating a transition toward small area estimation techniques.
This is the small area problem: a survey design that is robust at the national scale becomes underpowered at the scales where management decisions are needed. Because increasing the physical sample size is cost-prohibitive, the fundamental challenge lies in developing a framework that borrows strength across space and time without sacrificing inferential rigor.
Borrowing strength with Bayesian small area estimation
When only 2–3 plots fall within a critical area—a common occurrence in localized forest disturbances—even model-assisted estimators fail to produce stable estimates. This is the hallmark of the small area problem, requiring a strategy that “borrows strength” from other locations and points in time. The Fay–Herriot model provides this framework by linking noisy direct estimates to auxiliary data through a hierarchical structure. Areas with sparse data are informed by the global trend and neighboring units, while sample-rich areas remain driven by their own observations.
Fig. 5 — Simulation of the Fay–Herriot (FH) effect. At low sample sizes, the FH model reduces absolute error by shrinking noisy direct estimates toward the model mean including surrounding areas (i.e. borrowing strength). As local sample size increases, the estimator adaptively weights the local data, eventually converging with the direct design-based estimate.
The core novelty of this project is the extension of the Fay–Herriot model into a spatio-temporal Bayesian framework tailored to forest inventory survey data. By leveraging 20 years of French National Forest Inventory data, the model captures smooth temporal dynamics and spatial dependence. This enables the estimation of fine-resolution posterior distributions over complete temporal trajectories, while avoiding reliance on coarse 5-year rolling averages commonly used in forest inventories.
This PhD project will evaluate the most efficient ways to borrow strength from NFI data to build fine resolution spatiotemporal maps for forest monitoring.
Computation
To make this feasible at scale, inference is performed using Integrated Nested Laplace Approximation (INLA). INLA provides fast, deterministic approximations for latent Gaussian models, reducing computation time by several orders of magnitude compared to MCMC. This efficiency makes near-real-time updating of interpretable, high-resolution maps computationally accessible.
While the R-INLA framework and the associated SUMMER package are establishing themselves as standards in spatial epidemiology, their application to forest monitoring remains nascent. Adapting these tools to forestry presents a unique methodological challenge: unlike fixed administrative health districts, forest “small areas” are dynamically defined by remote sensing. By capturing residual spatial and temporal structure, this Bayesian strategy enables robust, uncertainty-aware inference even in data-sparse regions, delivering the stability required for operational forest management.
Three-year task overview
Further Reading
Design-based sampling for forest inventories
- Mandallaz, D. (2007). Sampling Techniques for Forest Inventories.
- Hill, A., Massey, A., Mandallaz, D. (2021). The R Package forestinventory: Design-Based Global and Small Area Estimations for Multiphase Forest Inventories.
- Massey, A. (2015). Multiphase estimation procedures for forest inventories under the design-based Monte Carlo approach.
- Schreuder, H. T., Gregoire, T. G., Wood, G. B. (1993). Sampling Methods for Multiresource Forest Inventory. (for broader finite population sampling concepts)
Small area estimation
- SUMMER: Small-Area-Estimation Unit/Area Models and Methods for Estimation in R – a vignette on generic SAE (a nice tutorial)
- Rao, J. N. K., Molina, I. (2015). Small Area Estimation (2nd ed.). (the most popular textbook)
- Breidenbach, J., et al. (2018). Unit-level and area-level small area estimation under heteroscedasticity using digital aerial photogrammetry data.
Spatial and temporal smoothing
- Li, Z. R., Martin, B. D., Dong, T. Q., Fuglstad, G.-A., Godwin, J., Paige, J., Riebler, A., Clark, S. J., Wakefield, J. (2025). Space-Time Smoothing of Survey Outcomes using the R Package SUMMER. The R Journal.
- Affleck, D. L. R., Gaines, W. L. (2025). Recognizing van Deusen’s mixed estimator for annual forest inventory as a linear mixed model.
Disturbance detection
How do I apply?
All applications must include:
- A curriculum vitae
- A motivation letter tailored to the subject
- A list of relevant coursework, or alternatively an academic transcript for the last years of study (unofficial versions are accepted)
- Contact details of two academic or professional references (name, affiliation, email)
IGN (a French governmental agency) requires all applications to be submitted through their official online system. Applications sent via email will not be considered. The official reference is not yet available but will be posted very soon.
- Visit the IGN job offers page.
- Locate the offer with the proper reference # (available soon).
- Click “Je postule” (French for “I apply”) to begin submission process.
Note: The IGN application portal is in French only. If you need assistance, consider using a browser translation tool or contacting us directly for guidance.
For any questions, applicants are highly encouraged to contact:
Alexander Massey
alexander.massey@ign.fr
Supervisers
This project is conducted at the Laboratory of Forest Inventory (LIF): a research unit of Géodata Paris, within the French National Institute of Geographic and Forest Information (IGN).
Dr. Alexander Massey
A statistician specializing in design-based inference and sampling theory within the context of National Forest Inventories. Alexander is a co-author of the forestinventory R-package and focuses on the intersection of Small Area Estimation and continuous population sampling under the Monte Carlo approach.
Prof. Olivier Bouriaud
An expert in forest ecology and dynamics with a focus on integrating LiDAR, photogrammetry, and optical imagery into forest monitoring frameworks. Formerly the head of the Romanian NFI's models and methods office, Olivier has been affiliated with the LIF research unit since 2022.
This PhD sits at the intersection of three classical statistical traditions — design-based survey inference, small area estimation, and Bayesian spatiotemporal modeling — unified by the Monte Carlo interpretation for forest inventory. The scientific challenge is real, the data infrastructure exists, and the urgency of the problem is only growing.
If you find yourself excited by the question of how to recover a disturbance signal from three field plots, this is the right place.

