RSF Explorer – Subnest Destination Choice

RSF Explorer

Resource Selection Function · Subnest Destination Choice

Configure. Place subnests on the arena and set attractiveness α and decay β.

Observe. In Field mode, click to record origin–destination (O–D) observations. Or Simulate from the configured model.

Fit & compare. Three models fit automatically — Voronoi, RSF, and KDE — with LL/N, cross-validated LL/N, and AICc for each. See the Methods tab for details.

RSF Model

P(j | x) = exp(α_j − β·d_j(x)) ∑_k exp(α_k − β·d_k(x))

j = subnest number
x = origin position in arena
α_j = attractiveness of subnest j
d_j(x) = distance from x to subnest j
α₁ ≡ 0 (reference); other α relative to N1

Distance decay β4.0

High β = sharp boundaries · Low β = diffuse, overlapping basins

Model assumption: the RSF assumes destination choices follow a proximity + attractiveness rule and estimates catchment boundaries under that assumption. If behavior doesn't follow this locality pattern, the RSF will fit poorly — the KDE (below) makes no such assumption and may then be more informative. See the Extensions tab for further options.

Configured Subnest Properties

Drag subnests on the arena to reposition. Max 6.

Observations (0)

Click any arena location to set the origin, then select which subnest the food was carried to — recording an origin–destination (O–D) observation.

0 observations

Fit Models to Data

Both RSF and KDE models fit automatically as observations are added. RSF uses MLE; KDE uses kernel density estimation with bandwidth h.

KDE bandwidth h0.12

Gaussian kernel: K_h(x, x_i) = exp(−‖x−x_i‖²/2h²). Larger h = smoother territories.

CONFIGURED RSF — ID OF MAX PROB.

Carried to →

This tab documents the statistical models underlying the Explorer. The first three sections introduce the models in the spatial ecology framing that motivated their application to polydomous nest transport — beginning with the utility-function framework shared by the parametric models and working through the hierarchy from most to least constrained. The final section translates the same models into the language of generalized linear models and conditional logit, which may be more familiar to readers with a behavioral ecology or statistical background, and points toward how these models would be fit in R for formal inference.

Framework: Utility Functions and the Softmax

The Voronoi and RSF models share a common probabilistic framework. For each potential destination subnest j, a utility V_j(x) summarizes how attractive that subnest is as a destination for a food item dropped at origin x — higher utility means a stronger pull. Destination probability is then given by the softmax (multinomial logistic) over all utilities:

P(j | x) = exp(V_j(x)) ∑_k exp(V_k(x))

The softmax converts utilities into probabilities that sum to one. A subnest with much higher utility than the others gets nearly all the probability; when utilities are equal, probability is split evenly. The two parametric models differ only in how V_j(x) is specified — the softmax and the estimation framework are the same throughout.

The KDE classifier does not use this framework at all — it estimates destination probability directly from the data without any utility function — which is precisely why it can detect spatial structure the parametric models cannot.

Voronoi (k=1) ⊂ RSF (k=K) ⊂ KDE (non-parametric)
V_j = −β·d_j · V_j = α_j−β·d_j · no V assumed

The Model Fit Comparison table reports in-sample LL/N, 5-fold cross-validated LL/N, and AICc for each model. AICc is only defined for the parametric models (Voronoi and RSF); the KDE comparison relies on cross-validated LL/N.

Voronoi Model (Null Hypothesis)

The Voronoi model is the most constrained specification of the utility function. It assumes destination choice depends only on distance — all subnests are equally attractive, so the utility is simply the negative distance cost:

V_j(x) = −β · d_j(x) (all α_j = 0)

With this V, the softmax assigns the highest probability to the nearest subnest at every origin point — producing the familiar Voronoi tessellation as the maximum-probability map. The probabilistic version (with fitted β) softens the boundaries: points near a catchment boundary have probability spread across adjacent subnests rather than assigned entirely to the closest one. There is one free parameter: the distance decay β.

As a null hypothesis

The Voronoi model is the natural null for testing whether differential attractiveness is needed. A likelihood ratio test between the Voronoi model (k=1) and the full RSF (k=K) has K−1 degrees of freedom:

LRT = 2·(LL_RSF − LL_Voronoi) ~ χ²_K−1

Rejection means equal attractiveness is not consistent with the data — at least one subnest has significantly higher or lower attractiveness than the reference. The AICc comparison gives a penalized version of the same test without requiring a fixed significance threshold.

In practice, comparing AICc values is often preferable to the LRT because it accounts for model complexity without requiring a distributional assumption about the test statistic.

RSF Model

The RSF extends the Voronoi model by freeing the attractiveness parameters — each subnest gets its own α_j, estimated from the data rather than fixed at zero. The utility function becomes:

V_j(x) = α_j − β · d_j(x)

Plugging this into the softmax gives the destination probability P(j | x). The Voronoi model is the special case α_j = 0 for all j. Because only differences in α matter, α₁ ≡ 0 is fixed as the reference and K−1 free attractiveness parameters are estimated, giving k = K total parameters (K−1 free α plus β).

Parameters are estimated by MLE using Adam gradient ascent on the conditional logit log-likelihood. Standard errors are derived from the observed Fisher information matrix (negative Hessian at the MLE), and 95% confidence intervals are Wald-type: α̂_j ± 1.96·SE(α̂_j).

Connection to Resource Selection Functions in ecology

RSFs are widely used in habitat selection studies (Manly et al.; Thurfjell et al. 2014) where an animal selects among habitat patches. The mathematical structure is identical to the conditional logit model from econometrics (McFadden 1974). Here the resource being selected is a subnest destination rather than a habitat patch, making this a direct application of an established ecological framework to cooperative transport.

KDE Classifier (Non-Parametric Baseline)

The KDE classifier makes no assumption about locality — it learns the territory shapes directly from the origin–destination data without reference to subnest positions. A separate Gaussian kernel density estimate is fit to the origin locations for each destination subnest, and destination probability at a new origin is proportional to the kernel density of each class:

P_KDE(j | x) = ∑_{{i : dest_i=j}} K_h(x, x_i) ∑_{all i} K_h(x, x_i)

where K_h(x, x_i) = exp(−‖x − x_i‖² / 2h²) is a Gaussian kernel. Bandwidth h controls smoothing: small h produces sharp, possibly disconnected territories; large h produces smooth territories approaching a proximity-based structure.

Scientific interpretation

Comparing RSF vs KDE cross-validated LL/N is a formal test of the RSF's locality assumption. If RSF CV LL/N ≈ KDE CV LL/N, proximity plus differential attractiveness is a sufficient explanation of the data. If KDE fits substantially better, there is spatial structure the RSF cannot capture — non-convex territories, discontinuous allegiance zones, or trail effects — and that discrepancy is itself a biological finding worth reporting.

In-sample LL/N always favors the KDE because it is more flexible. Cross-validated LL/N is the appropriate comparison because it penalizes overfitting implicitly through holdout scoring.

Connection to Generalized Linear Models

The RSF framework was developed in spatial ecology but is mathematically identical to a class of models that will be familiar from statistics and behavioral ecology: the conditional logit (McFadden 1974), fit in R via the mlogit package. Making this connection explicit is useful both for understanding the model and for fitting it to real data.

The softmax as a link function

In GLM notation, a model has a linear predictor η and a link function g such that g(μ) = η. In the RSF, the utility V_j(x) plays the role of the linear predictor — it is linear in the parameters α_j and β. The softmax is the inverse link function: it maps the K linear predictors to K probabilities that sum to 1. This is exactly the multinomial logistic link, the generalization of the logistic link to K > 2 categories.

Data structure: long format

Fitting the conditional logit requires data in long format — one row per (observation, alternative) pair. For N food drops and K subnests, this gives N×K rows. Each triplet of rows for a single observation looks like:

obs_id=1, subnest=N1, distance=2.6, chosen=0
obs_id=1, subnest=N2, distance=1.6, chosen=1
obs_id=1, subnest=N3, distance=7.2, chosen=0

The obs_id groups rows into choice occasions — the K rows sharing an obs_id form the denominator of one softmax. The chosen column is 1 for the observed destination and 0 for all others. The single distance column carries the alternative-varying predictor; a single coefficient on this column is by construction shared across all subnests, which is what enforces the shared-β assumption of the RSF.

R formula equivalents

The four models in the Explorer correspond directly to mlogit formulas:

# Full RSF: shared β, per-subnest α
mlogit(chosen ~ distance | subnest, data=od_long, idx=c("obs_id","subnest"))

# Voronoi / distance-only: shared β, no α
mlogit(chosen ~ distance - 1, data=od_long, idx=c("obs_id","subnest"))

# Attractiveness-only: per-subnest α, no distance
mlogit(chosen ~ 0 | subnest, data=od_long, idx=c("obs_id","subnest"))

In each formula, terms to the left of | are alternative-varying predictors (get a shared coefficient); terms to the right are observation-varying predictors that receive per-alternative coefficients (the α_j intercepts). The second element of idx defines what "alternative" means — here, the subnest column — and mlogit automatically uses it to construct the softmax denominator and the per-alternative intercepts.

Why not glm or multinom?

A plain glm(..., family=binomial) on the long-format data would treat each row as an independent Bernoulli trial, ignoring the constraint that exactly one chosen=1 must occur per obs_id group. It would produce incorrect standard errors and biased estimates. The multinom function from nnet enforces the sum-to-1 constraint correctly, but requires wide-format data with separate distance columns per subnest (dist_N1, dist_N2, …), which forces a separate β per subnest — losing the shared distance-sensitivity assumption that is the RSF's key structural claim. The mlogit long format is the natural representation for the shared-β model and enforces the softmax likelihood correctly.

The RSF is sometimes described as distinct from multinomial logistic regression in the ecology literature, but this reflects the parallel development of the same model in different fields rather than any mathematical difference. McFadden's conditional logit (econometrics, 1974) and Manly et al.'s RSF (spatial ecology, 1993) are the same model.

The models implemented in the Explorer — Voronoi, RSF, and KDE — treat each observation as a single food item of unspecified weight, carried by an anonymous group. The sections below describe extensions that relax those assumptions. They are not implemented in the Explorer but are natural next steps for a full analysis, particularly when load weight and transport mode are experimentally varied or when worker identity can be established through behavioral assays.

Weight as a Covariate

In the base RSF each observation carries no weight attribute — all observations at a given origin are treated identically. But experimentally, load weight can be varied across trials to test whether heavier loads are more strongly pulled toward the nearest subnest. Fitting separate RSF models per weight class and comparing parameter estimates across classes is a natural first step before committing to a joint model.

Load weight w cannot enter the model as a standalone term because it is constant across all destination alternatives for any single observation — it would cancel out of the softmax. It enters meaningfully only through interactions with terms that vary across alternatives (i.e., distance d_j).

Weight-modulated distance decay

V_j(x, w) = α_j − (β₀ + β₁·w) · d_j(x)

β₁ tests whether heavier loads increase distance sensitivity — i.e., do heavy loads go more strongly toward the nearest subnest? A positive β₁ connects directly to central place foraging theory: transport cost increases with weight, raising the benefit of proximity.

Weight-modulated attractiveness

V_j(x, w) = (α_j + γ_j·w) − β · d_j(x)

γ_j tests whether a subnest's relative attractiveness changes with load weight — for instance, a larger subnest with more available helpers might become disproportionately preferred for heavy cooperative loads. This adds K−1 additional parameters.

Full model

V_j(x, w) = (α_j + γ_j·w) − (β₀ + β₁·w) · d_j(x)

Center w by subtracting its mean before fitting, so β₀ is interpretable as distance sensitivity at the average load weight rather than at w = 0.

Home-Subnest Identity (Mixed Logit)

The base RSF assumes destination choice depends only on the current spatial relationship between origin and subnest — the model has no memory of which subnest the transporting workers came from. In a polydomous colony, workers may carry food preferentially to a particular subnest regardless of proximity, creating a pattern the basic RSF cannot capture. The KDE vs RSF fit comparison (Methods tab) will detect this as a KDE advantage, but cannot identify the mechanism.

Latent-class extension

V_j(x, z_i) = α_j + λ_{z_i, j} − β · d_j(x)

Here z_i ∈ {1, …, K} is the home-subnest identity of the transporting group for observation i, and λ_z,j is the additional preference for subnest j among workers from subnest z. When λ_z,z ≫ 0 and λ_z,j≠z ≈ 0, workers strongly prefer their home subnest regardless of proximity.

Estimation — if individual identity is known

The mixed logit does not require knowing which subnest an ant came from — it requires knowing that the same ant was observed on multiple transports. If individual identity z_i is observed (via paint-marking, RFID, or laboratory colonies where ants can be tracked), each ant contributes several O-D observations, and per-ant random effects λ_z,j on subnest attractiveness can be estimated via maximum simulated likelihood or the EM algorithm. These random effects capture systematic individual variation in destination preference regardless of its cause — home-subnest fidelity, body size, foraging specialization, or any other unmeasured individual attribute.

If individual identity is not observed — which is the typical field situation — z_i must be treated as a latent variable. The model then becomes a finite mixture: each observation is assigned a probabilistic weight over the K possible "types," and parameters are estimated by integrating over the mixing distribution. This is the mixed logit in the sense of Revelt and Train (1998). Identification requires enough repeat observations per condition to resolve the mixture components, which in practice means a controlled experimental design rather than opportunistic field sampling.

In field colonies of Novomessor, individual tracking at scale is not currently feasible, and aggression assays establish colony complex membership only — not individual subnest affiliation. Laboratory experiments with marked ants are the more tractable path to testing individual-level destination fidelity.