Bayesian Spatiotemporal Small Area Estimation for Forest Disturbance Monitoring

PhD Opportunity

Forest inventory Survey sampling Small area estimation Bayesian / INLA Remote sensing

Location

Nancy, France

Duration

3 years

Salary (gross)

~€2 300/month

Start

Sep–Oct 2026

Social benefits

Public healthcare + education + full worker protections

Workplace perks

Subsidized meals + commute reimbursement

Research funding

International conference travel

Application deadline

Open until filled

Partners & Funding

Funder

The Institut national de l'information géographique et forestière (IGN) is a French public establishment under the joint authority of the Ministries of Ecology and Forestry. As the national reference operator for geographic and forest information, IGN produces, qualifies, and disseminates the sovereign geospatial and forest data that underpin public policy, including responsibility for the French National Forest Inventory.

Host laboratory

The Laboratory of Forest Inventory (LIF) is a research unit of IGN. Based in Nancy on a historic forestry and engineering campus now part of AgroParisTech, LIF combines forest inventory field observations, remote sensing, and statistical modelling to develop innovative methods supporting sustainable forest management, environmental monitoring, and responses to ecological and climate transitions.

Degree awarding institution

The candidate will be enrolled at Université de Lorraine within the SIReNa doctoral school (Sciences et Ingénierie des Ressources Naturelles), which brings together laboratories from Université de Lorraine, INRAE, CNRS, and AgroParisTech.

Career Prospects

Graduates of this PhD program are equipped to pursue careers as experts in advanced statistical modeling and spatial data analysis. They can contribute to both fundamental and applied research, as well as to decision-making and policy in sectors where advanced statistical modeling and spatial data analysis are critical. This includes:

Research institutes and academia
National statistical agencies
Environmental monitoring agencies
Private-sector geospatial technology companies and environmental consulting firms
NGOs and intergovernmental bodies focused on sustainable development, climate change, or global health

Program Structure

As part of the Doctoral School Sirena at Université de Lorraine, this PhD is a 3-year, fully funded research position with employee status. The program combines a strong research focus with light coursework requirements (30 credits total):

10 credits from classes, including English (or French for non-French speakers)
10 credits from conference participation and scientific publications
5 credits from activities preparing for future career paths

For the full description of credit requirements, see the Doctoral School SIReNa’s website.

The primary focus is on producing high-quality scientific publications, which are compiled into a dissertation and defended in a public oral examination.

Who should apply?

A candidate with a strong Master’s in quantitative field (e.g. statistics, data science, applied mathematics, survey sampling, epidemiology, …). You should be comfortable with statistical modeling and programming in R or Python. Experience with survey sampling, spatial statistics, or Bayesian methods is clearly a plus — but motivation and the capacity to learn matter more than a checklist.

Required:

Master’s degree (or equivalent 5-year diploma) by the start date
Solid knowledge of statistical modeling
Programming skills in R or Python
Fluency in English (written and spoken)

Preferred:

Experience with survey sampling or spatial statistics
Experience with hierarchical Bayesian methods
Interest in environmental or geospatial applications
Interest in disease mapping (epidemiological analogy)
French is a plus but not required

Presentation of topic

Why forests — and why now?

Forests cover about 30% of the Earth’s land surface and are central to the carbon cycle, biodiversity, and water regulation. Climate change is making them more fragile: droughts are longer, wildfires more intense, storms more frequent, and insect outbreaks — bark beetles, pine processionary moths — are expanding into new latitudes.

What makes this particularly challenging from a monitoring point of view is that these disturbances are localized and fast-moving. A windstorm might flatten 500 hectares in one valley while leaving the adjacent one untouched. An insect outbreak can sweep through a region in a single season. Traditional large-scale forest surveys are not designed for this kind of spatial and temporal granularity.

Fig. 1 — Forest disturbance from 2025 wildfires in Aude, France (left) and 2019 bark beetle damage in Grand Est, France (right), showing contrasting spatial footprints as detected by Sentinel-2 imagery. Normalized Burn Ratio (NBR) is an index designed to highlight burnt areas in large fire zones.

Left: based on Massey & Vega (2025). Right: generated using the fordead package (de Boissieu et al., 2024).

Images courtesy of the LIF and UMR TETIS.

Remote sensing (satellite imagery, LiDAR) can detect that something happened — a canopy has opened, reflectance has changed. But it cannot directly tell us how much timber volume was affected, which is the key quantity for carbon accounting and forest management decisions. We need direct, consistent measurements of forest attributes across all forested areas, which are costly and time-consuming to collect. Fortunately, most governments provide this type of data via national forest inventories.

CORE TENSION: National forest inventories give direct measurements of forest attributes — but are not designed to operate at fine spatial and temporal resolution. Remote sensing gives us wall-to-wall coverage — but of proxy signals, not volume. This PhD is about bridging that gap with statistics.

What is a forest inventory?

The French National Forest Inventory (NFI), run by IGN, relies on a carefully designed probability sample: each year, around 7,000 circular plots are selected across the territory. At each plot, trained field crews record tree diameters, heights, species composition, basal area, and many other attributes. From this relatively small sample, statisticians produce estimates of forest resources at regional and national scales.

The gold standard for this type of official data production is the design-based approach. In this framework, the forest itself is viewed as a fixed population: every tree has a fixed (though unknown) volume that is measurable without error, and randomness arises only from how we took the sample—not from the trees or measurements themselves. This contrasts with model-based approaches, where the data are assumed to be generated by an underlying stochastic process.

The sample units are points defined by geographic coordinates, not trees. At each sampled point, we measure the trees within a plot and convert their total (e.g., timber volume) into a local density (e.g., m³ per hectare). Conceptually, each point in the forest has an associated density value, whether or not it is observed. Taken together, these values define a continuous local density surface over the territory.

Fig. 2 — Monte Carlo interpretation of a forest inventory with tree height as the target variable. The local density surface \(Y(x)\) (red curve) is defined for every location \(x \in F\) and represents the quantity we wish to integrate. The pink segments represent the regions around each tree where a plot center would include that tree in the measurement.

*In practice, NFIs often use systematic grids, but for intuition we consider independent uniform sampling.

Fig. 2 illustrates this idea: trees induce a local density surface, and sampling plot locations corresponds to drawing random points on that surface. Estimating the total resource is therefore equivalent to integrating this surface over the domain. Because we only observe the density at a finite set of randomly selected locations, this can be interpreted as a form of Monte Carlo integration.

If the total surface area of the forest domain is known, the total resource can be expressed as the area multiplied by the true spatial mean \(\bar{Y}\). This leads naturally to the sample mean estimator \(\hat{\bar{Y}}\), which is unbiased under independent sampling designs and converges to the true value by the Law of Large Numbers. By the Central Limit Theorem, we can construct an asymptotically valid confidence interval using the variance \(\hat V(\hat{\bar{Y}})=S^2/n\).

When sampling is not uniform—as in the French NFI—the same idea applies, but the estimator becomes a weighted mean that accounts for unequal inclusion probabilities.

Auxiliary maps and model-assisted estimation

The sample mean estimator is adequate at large scales but limited when spatial resolution matters. To do better we can incorporate remote sensing from satellite imagery or LiDAR point clouds to build wall-to-wall maps of proxy variables — canopy height models (CHM), spectral indices — that correlate with timber volume. If we consider this auxiliary as predictor in a predictive model, we could generate a predictive density surface that is known everywhere and thus has a known integral. Now we only need to know the volume of the difference between the predictive surface and local density surface so we do Monte Carlo integration on the residual as shown in Fig. 3.

Model assisted forest inventory schematic

Fig. 3 — Model-assisted estimation using a canopy height model (CHM), denoted \(Z(x)\), to create a predictive density surface with a linear model.

This is called model-assisted estimation which is still design-based because the model was only used to transform the problem to the residuals. It does not even matter if the assisting model is correct — we get improved precision as long as \(V(Y(x))>V(R(x))\).

Model-assisted estimation can improve precision while preserving design-unbiasedness — even if the model is completely wrong.

Why this breaks down for monitoring

Model-assisted estimators rely on asymptotic validity, requiring a sufficient number of samples for their design-based properties to hold. In forest monitoring, this condition is typically met only for most widespread events; for rare or localized disturbances, the framework begins to fracture. Specifically, variance estimates become highly unstable when \(n < 6\), leading to confidence intervals with unreliable coverage rates that fail to represent true uncertainty.

Beyond sample size, the efficiency of these methods is strictly tied to the quality and spatiotemporal alignment of auxiliary data. Precision gains diminish rapidly unless the relationship between ground plots and remote sensing predictors is exceptionally strong (Fig. 4). Maintaining this high level of predictive power is a significant operational challenge, as misalignment or sensor noise can quickly render auxiliary information ineffective in data-sparse regions.

Fig. 4 — Relative efficiency of model-assisted estimation compared to increasing the effective sample size. The curves illustrate the sensitivity of precision gains to the correlation (\(R^2\)) between auxiliary predictors and ground observations. Tripling the effective sample size is mathematically equivalent to employing a model-assisted technique with an R² of 0.67.

This is the small area problem: a survey design that is robust at the national scale becomes underpowered at the scales where management decisions are needed. Because increasing the physical sample size is cost-prohibitive, the fundamental challenge lies in developing a framework that borrows strength across space and time without sacrificing inferential rigor.

Borrowing strength with Bayesian small area estimation

When only a handful of plots fall within a critical area—a common occurrence in localized forest disturbances—even model-assisted estimators fail to produce stable estimates. This is the hallmark of the small area problem, requiring a strategy that “borrows strength” from other locations and points in time. The Fay–Herriot model provides this framework by linking noisy direct estimates to auxiliary data through a hierarchical structure. Areas with sparse data are informed by the global trend and neighboring units, while sample-rich areas remain driven by their own observations.

Fig. 5 — Simulation of the Fay–Herriot (FH) effect. At low sample sizes, the FH model reduces absolute error by shrinking noisy direct estimates toward the model mean including surrounding areas (i.e. borrowing strength). As local sample size increases, the estimator adaptively weights the local data, eventually converging with the direct design-based estimate.

The core novelty of this project is the extension of small-area estimation models, such as Fay–Herriot, into a spatio-temporal Bayesian framework tailored to forest inventory data. Whereas model-assisted estimation depends on the acquisition and processing of high-quality remote sensing auxiliaries to improve precision, spatio-temporal smoothing offers an alternative source of information by borrowing strength from neighbouring areas and historical observations. This provides a broadly applicable framework for achieving precision gains using the inventory data itself, while remaining compatible with auxiliary remote sensing products when available rather than relying on them as the primary source of efficiency gains.

Furthermore, by leveraging more than 20 years of French National Forest Inventory measurements, the proposed framework captures both spatial dependence and smooth temporal dynamics. This enables the estimation of fine-resolution posterior distributions and complete annual trajectories of forest attributes, while avoiding the coarse 5-year rolling averages commonly used in forest inventories.

This PhD project will evaluate the most efficient ways to borrow strength from NFI data to build fine resolution spatiotemporal maps for forest monitoring.

Computation

To make this feasible at scale, inference is performed using Integrated Nested Laplace Approximation (INLA). INLA provides fast, deterministic approximations for latent Gaussian models, reducing computation time by several orders of magnitude compared to MCMC. This efficiency makes near-real-time updating of interpretable, high-resolution maps computationally accessible.

While the R-INLA framework and the associated SUMMER package are establishing themselves as standards in spatial epidemiology, their application to forest monitoring remains nascent. Adapting these tools to forestry presents a unique methodological challenge: unlike fixed administrative health districts, forest “small areas” are dynamically defined by remote sensing. By capturing residual spatial and temporal structure, this Bayesian strategy enables robust, uncertainty-aware inference even in data-sparse regions, delivering the stability required for operational forest management.

Three-year task overview

🌲

Digital forest twin

Use a simulated forest at LIF to generate realistic sampling scenarios. Evaluate estimator properties under controlled conditions.

🎯

Enhance direct estimation

Evaluating model-assisted estimators to refine direct estimates using LiDAR and satellite auxiliary data. This step optimizes precision prior to the Bayesian spatio-temporal smoothing phase.

🗺️

Variance smoothing

The weights that determine how much and when to borrow strength depend on the direct estimate's variance, which becomes unstable when n<6. Variance smoothing techniques need to be developed.

⏱️

Spatiotemporal extension

Develop Fay–Herriot-type models that borrow strength both across neighboring areas and full time series of historical NFI data.

📡

Real NFI + remote sensing

Apply methods to French NFI data and satellite imagery. Produce high-resolution maps of disturbance severity in terms of affected timber volume.

📦

Reproducible tools

Package methods as an R package and / or Shiny application for operational use by IGN and the broader forest monitoring community.

Further Reading

Design-based sampling for forest inventories

Small area estimation

Spatial and temporal smoothing

Disturbance detection

How do I apply?

All applications must include:

A curriculum vitae,
A motivation letter tailored to the subject,
A list of relevant coursework, or alternatively an academic transcript for the last years of study (unofficial versions are accepted),
Contact details of two academic or professional references (name, affiliation, email).

Applications should be sent directly to:

Alexander Massey
alexander.massey@ign.fr

The French doctoral system can seem complex from an international perspective. All applicants are highly encouraged to contact LIF directly with any questions. We’re happy to help!

Supervisers

This project is conducted at the Laboratory of Forest Inventory (LIF): a research unit of Géodata Paris, within the French National Institute of Geographic and Forest Information (IGN).

Dr. Alexander Massey

Researcher, IGN (Géodata Paris)

A statistician specializing in design-based inference and sampling theory within the context of National Forest Inventories. Alexander is a co-author of the forestinventory R-package and focuses on the intersection of Small Area Estimation and continuous population sampling under the Monte Carlo approach.

Focus: Sampling Theory, SAE, R-Package Development

Prof. Olivier Bouriaud

Professor, University of Suceava

An expert in forest ecology and dynamics with a focus on integrating LiDAR, photogrammetry, and optical imagery into forest monitoring frameworks. Formerly the head of the Romanian NFI's models and methods office, Olivier has been affiliated with the LIF research unit since 2022.

Focus: Forest Dynamics, Digital Twins, Remote Sensing Integration

"Turning noisy survey estimates into structured spatial predictions of forest disturbance."

This PhD sits at the intersection of three classical statistical traditions — design-based survey inference, small area estimation, and Bayesian spatiotemporal modeling — unified by the Monte Carlo interpretation for forest inventory. The scientific challenge is real, the data infrastructure exists, and the urgency of the problem is only growing.

If you find yourself excited by the question of how to recover a disturbance signal from three field plots, this is the right place.