Bayesian Spatiotemporal Small Area Estimation for Forest Disturbance Monitoring

PhD Opportunity

Forest inventory Survey sampling Small area estimation Bayesian / INLA Remote sensing

Location
Nancy, France
Duration
3 years
Salary (gross)
~€2 300/month
Start
Sep–Oct 2026
Social benefits
Public healthcare + education + full worker protections
Workplace perks
Subsidized meals + commute reimbursement
Research funding
International conference travel
Application deadline
Open until filled
Partners & Funding
Co-funder

PEPR FORESTT is a national research initiative funded by France 2030. It brings together a consortium of leading French institutions, including INRAE, Université de Pau et des Pays de l'Adour, AgroParisTech, Ecole d'Ingénieur de Purpan, Aix-Marseille Université, ONF, CIRAD, CNRS, IRD, IGN, and CNPF. This interdisciplinary program aims to advance our understanding, monitoring, and management of forests in the face of climate change and increasing disturbance risks.

Coordinator institution
The Institut national de recherche pour l'agriculture, l'alimentation et l'environnement (INRAE) coordinates the X-RISKS project within the PEPR FORESTT programme. The project aims to study and anticipate the multiple, extreme risks threatening forests under climate change.
Co-funder
The Institut national de l'information géographique et forestière (IGN) is responsible for the French National Forest Inventory. It produces high-resolution forest monitoring outputs by integrating field surveys with geospatial data.
Degree awarding institution
The candidate will be enrolled in the SIReNa doctoral school (Sciences et Ingénierie des Ressources Naturelles), bringing together laboratories from Université de Lorraine, INRAE, CNRS, and AgroParisTech.
Career Prospects

Graduates of this PhD program are equipped to pursue careers as experts in advanced statistical modeling and spatial data analysis. They can contribute to both fundamental and applied research, as well as to decision-making and policy in sectors where advanced statistical modeling and spatial data analysis are critical. This includes:

  • Research institutes and academia
  • National statistical agencies
  • Environmental monitoring agencies
  • Private-sector geospatial technology companies and environmental consulting firms
  • NGOs and intergovernmental bodies focused on sustainable development, climate change, or global health
Program Structure

As part of the Doctoral School Sirena at Université de Lorraine, this PhD is a 3-year, fully funded research position with employee status. The program combines a strong research focus with light coursework requirements (30 credits total):

  • 10 credits from classes, including English (or French for non-French speakers)
  • 10 credits from conference participation and scientific publications
  • 5 credits from activities preparing for future career paths

For the full description of credit requirements, see the Doctoral School SIReNa’s website.

The primary focus is on producing high-quality scientific publications, which are compiled into a dissertation and defended in a public oral examination.

Who should apply?

A candidate with a strong Master’s in quantitative field (e.g. statistics, data science, applied mathematics, survey sampling, epidemiology, …). You should be comfortable with statistical modeling and programming in R or Python. Experience with survey sampling, spatial statistics, or Bayesian methods is clearly a plus — but motivation and the capacity to learn matter more than a checklist.

Required:

  • Master’s degree (or equivalent 5-year diploma) by the start date
  • Solid knowledge of statistical modeling
  • Programming skills in R or Python
  • Fluency in English (written and spoken)

Preferred:

  • Experience with survey sampling or spatial statistics
  • Experience with hierarchical Bayesian methods
  • Interest in environmental or geospatial applications
  • Interest in disease mapping (epidemiological analogy)
  • French is a plus but not required
Presentation of topic

Why forests — and why now?

Forests cover about 30% of the Earth’s land surface and are central to the carbon cycle, biodiversity, and water regulation. Climate change is making them more fragile: droughts are longer, wildfires more intense, storms more frequent, and insect outbreaks — bark beetles, pine processionary moths — are expanding into new latitudes.

What makes this particularly challenging from a monitoring point of view is that these disturbances are localized and fast-moving. A windstorm might flatten 500 hectares in one valley while leaving the adjacent one untouched. An insect outbreak can sweep through a region in a single season. Traditional large-scale forest surveys are not designed for this kind of spatial and temporal granularity.

disturbances

Fig. 1 — Forest disturbance from 2025 wildfires in Aude, France (left) and 2019 bark beetle damage in Grand Est, France (right), showing contrasting spatial footprints as detected by Sentinel-2 imagery. Normalized Burn Ratio (NBR) is an index designed to highlight burnt areas in large fire zones.

Left: based on Massey & Vega (2025). Right: generated using the fordead package (de Boissieu et al., 2024).

Images courtesy of the LIF and UMR TETIS.

Remote sensing (satellite imagery, LiDAR) can detect that something happened — a canopy has opened, reflectance has changed. But it cannot directly tell us how much timber volume was affected, which is the key quantity for carbon accounting and forest management decisions. We need direct, consistent measurements of forest attributes across all forested areas, which are costly and time-consuming to collect. Fortunately, most governments provide this type of data via national forest inventories.

CORE TENSION: National forest inventories give direct measurements of forest attributes — but are not designed to operate at fine spatial and temporal resolution. Remote sensing gives us wall-to-wall coverage — but of proxy signals, not volume. This PhD is about bridging that gap with statistics.

What is a forest inventory?

The French National Forest Inventory (NFI), run by IGN, relies on a carefully designed probability sample: each year, around 7,000 circular plots are selected across the territory. At each plot, trained field crews record tree diameters, heights, species composition, basal area, and many other attributes. From this relatively small sample, statisticians produce estimates of forest resources at regional and national scales.

The gold standard for this type of official data production is the design-based approach. In this framework, the forest itself is viewed as a fixed population: every tree has a fixed (though unknown) volume that is measurable without error, and randomness arises only from how we took the sample—not from the trees or measurements themselves. This contrasts with model-based approaches, where the data are assumed to be generated by an underlying stochastic process.

The sample units are points defined by geographic coordinates, not trees. At each sampled point, we measure the trees within a plot and convert their total (e.g., timber volume) into a local density (e.g., m³ per hectare). Conceptually, each point in the forest has an associated density value, whether or not it is observed. Taken together, these values define a continuous local density surface over the territory.

Monte Carlo forest inventory schematic

Fig. 2 — Monte Carlo interpretation of a forest inventory with tree height as the target variable. The local density surface \(Y(x)\) (red curve) is defined for every location \(x \in F\) and represents the quantity we wish to integrate. The pink segments represent the regions around each tree where a plot center would include that tree in the measurement.

*In practice, NFIs often use systematic grids, but for intuition we consider independent uniform sampling.

Fig. 2 illustrates this idea: trees induce a local density surface, and sampling plot locations corresponds to drawing random points on that surface. Estimating the total resource is therefore equivalent to integrating this surface over the domain. Because we only observe the density at a finite set of randomly selected locations, this can be interpreted as a form of Monte Carlo integration.

If the total surface area of the forest domain is known, the total resource can be expressed as the area multiplied by the true spatial mean \(\bar{Y}\). This leads naturally to the sample mean estimator \(\hat{\bar{Y}}\), which is unbiased under independent sampling designs and converges to the true value by the Law of Large Numbers. By the Central Limit Theorem, we can construct an asymptotically valid confidence interval using the variance \(\hat V(\hat{\bar{Y}})=S^2/n\).

When sampling is not uniform—as in the French NFI—the same idea applies, but the estimator becomes a weighted mean that accounts for unequal inclusion probabilities.

Auxiliary maps and model-assisted estimation

The sample mean estimator is adequate at large scales but limited when spatial resolution matters. To do better we can incorporate remote sensing from satellite imagery or LiDAR point clouds to build wall-to-wall maps of proxy variables — canopy height models (CHM), spectral indices — that correlate with timber volume. If we consider this auxiliary as predictor in a predictive model, we could generate a predictive density surface that is known everywhere and thus has a known integral. Now we only need to know the volume of the difference between the predictive surface and local density surface so we do Monte Carlo integration on the residual as shown in Fig. 3.

Model assisted forest inventory schematic

Fig. 3 — Model-assisted estimation using a canopy height model (CHM), denoted \(Z(x)\), to create a predictive density surface with a linear model.

This is called model-assisted estimation which is still design-based because the model was only used to transform the problem to the residuals. It does not even matter if the assisting model is correct — we get improved precision as long as \(V(Y(x))>V(R(x))\).

Model-assisted estimation can improve precision while preserving design-unbiasedness — even if the model is completely wrong.

Why this breaks down for monitoring

Model-assisted estimators rely on asymptotic validity, requiring a sufficient number of samples for their design-based properties to hold. In forest monitoring, this condition is typically met only for most widespread events; for rare or localized disturbances, the framework begins to fracture. Specifically, variance estimates become highly unstable when \(n < 6\), leading to confidence intervals with unreliable coverage rates that fail to represent true uncertainty.

Beyond sample size, the efficiency of these methods is strictly tied to the quality and spatiotemporal alignment of auxiliary data. Precision gains diminish rapidly unless the relationship between ground plots and remote sensing predictors is exceptionally strong (Fig. 4). Maintaining this high level of predictive power is a significant operational challenge, as misalignment or sensor noise can quickly render auxiliary information ineffective in data-sparse regions.

Model assisted efficiency

Fig. 4 — Relative efficiency of model-assisted estimators. The curves illustrate the sensitivity of precision gains to the correlation (\(R^2\)) between auxiliary predictors and ground observations. In practice, weak or misaligned auxiliary data leads to a rapid loss of efficiency, necessitating a transition toward small area estimation techniques.

This is the small area problem: a survey design that is robust at the national scale becomes underpowered at the scales where management decisions are needed. Because increasing the physical sample size is cost-prohibitive, the fundamental challenge lies in developing a framework that borrows strength across space and time without sacrificing inferential rigor.

Borrowing strength with Bayesian small area estimation

When only 2–3 plots fall within a critical area—a common occurrence in localized forest disturbances—even model-assisted estimators fail to produce stable estimates. This is the hallmark of the small area problem, requiring a strategy that “borrows strength” from other locations and points in time. The Fay–Herriot model provides this framework by linking noisy direct estimates to auxiliary data through a hierarchical structure. Areas with sparse data are informed by the global trend and neighboring units, while sample-rich areas remain driven by their own observations.

Fay Herriot shrinkage

Fig. 5 — Simulation of the Fay–Herriot (FH) effect. At low sample sizes, the FH model reduces absolute error by shrinking noisy direct estimates toward the model mean including surrounding areas (i.e. borrowing strength). As local sample size increases, the estimator adaptively weights the local data, eventually converging with the direct design-based estimate.

The core novelty of this project is the extension of the Fay–Herriot model into a spatio-temporal Bayesian framework tailored to forest inventory survey data. By leveraging 20 years of French National Forest Inventory data, the model captures smooth temporal dynamics and spatial dependence. This enables the estimation of fine-resolution posterior distributions over complete temporal trajectories, while avoiding reliance on coarse 5-year rolling averages commonly used in forest inventories.

This PhD project will evaluate the most efficient ways to borrow strength from NFI data to build fine resolution spatiotemporal maps for forest monitoring.

Computation

To make this feasible at scale, inference is performed using Integrated Nested Laplace Approximation (INLA). INLA provides fast, deterministic approximations for latent Gaussian models, reducing computation time by several orders of magnitude compared to MCMC. This efficiency makes near-real-time updating of interpretable, high-resolution maps computationally accessible.

While the R-INLA framework and the associated SUMMER package are establishing themselves as standards in spatial epidemiology, their application to forest monitoring remains nascent. Adapting these tools to forestry presents a unique methodological challenge: unlike fixed administrative health districts, forest “small areas” are dynamically defined by remote sensing. By capturing residual spatial and temporal structure, this Bayesian strategy enables robust, uncertainty-aware inference even in data-sparse regions, delivering the stability required for operational forest management.

Three-year task overview

🌲
Digital forest twin
Use a simulated forest at LIF to generate realistic sampling scenarios. Evaluate estimator properties under controlled conditions.
🎯
Enhance direct estimation
Evaluating model-assisted estimators to refine direct estimates using LiDAR and satellite auxiliary data. This step optimizes precision prior to the Bayesian spatio-temporal smoothing phase.
🗺️
Variance smoothing
The weights that determine how much and when to borrow strength depend on the direct estimate's variance, which becomes unstable when n<6. Variance smoothing techniques need to be developed.
⏱️
Spatiotemporal extension
Develop Fay–Herriot-type models that borrow strength both across neighboring areas and full time series of historical NFI data.
📡
Real NFI + remote sensing
Apply methods to French NFI data and satellite imagery. Produce high-resolution maps of disturbance severity in terms of affected timber volume.
📦
Reproducible tools
Package methods as an R package and / or Shiny application for operational use by IGN and the broader forest monitoring community.

Further Reading

Design-based sampling for forest inventories

Small area estimation

Spatial and temporal smoothing

Disturbance detection

How do I apply?

All applications must include:

  1. A curriculum vitae
  2. A motivation letter tailored to the subject
  3. A list of relevant coursework, or alternatively an academic transcript for the last years of study (unofficial versions are accepted)
  4. Contact details of two academic or professional references (name, affiliation, email)

IGN (a French governmental agency) requires all applications to be submitted through their official online system. Applications sent via email will not be considered. The official reference is not yet available but will be posted very soon.

  1. Visit the IGN job offers page.
  2. Locate the offer with the proper reference # (available soon).
  3. Click “Je postule” (French for “I apply”) to begin submission process.

Note: The IGN application portal is in French only. If you need assistance, consider using a browser translation tool or contacting us directly for guidance.


For any questions, applicants are highly encouraged to contact:

Alexander Massey
alexander.massey@ign.fr

Supervisers

This project is conducted at the Laboratory of Forest Inventory (LIF): a research unit of Géodata Paris, within the French National Institute of Geographic and Forest Information (IGN).

Dr. Alexander Massey

Researcher, IGN (Géodata Paris)

A statistician specializing in design-based inference and sampling theory within the context of National Forest Inventories. Alexander is a co-author of the forestinventory R-package and focuses on the intersection of Small Area Estimation and continuous population sampling under the Monte Carlo approach.

Focus: Sampling Theory, SAE, R-Package Development

Prof. Olivier Bouriaud

Professor, University of Suceava

An expert in forest ecology and dynamics with a focus on integrating LiDAR, photogrammetry, and optical imagery into forest monitoring frameworks. Formerly the head of the Romanian NFI's models and methods office, Olivier has been affiliated with the LIF research unit since 2022.

Focus: Forest Dynamics, Digital Twins, Remote Sensing Integration


"Turning noisy survey estimates into structured spatial predictions of forest disturbance."

This PhD sits at the intersection of three classical statistical traditions — design-based survey inference, small area estimation, and Bayesian spatiotemporal modeling — unified by the Monte Carlo interpretation for forest inventory. The scientific challenge is real, the data infrastructure exists, and the urgency of the problem is only growing.

If you find yourself excited by the question of how to recover a disturbance signal from three field plots, this is the right place.