Y-Tree Trajectory Detrending & Spectral Analysis

Interactive companion for fitting splines to load trajectories, projecting samples, and reading windowed Fourier transforms

A load carried up a curved tree leaves a trajectory whose shape is dominated by the tree itself. To detect anything behavioral riding on top of that shape — pre-fork hesitation, side-to-side deliberation, drift — one first needs to subtract the trend. This widget illustrates two complementary ways to do it: (1) fit a smooth spline to the trajectory and study the perpendicular residuals d(s), or (2) compute a windowed Fourier transform of lateral position x(z) along the trunk and treat low spatial frequencies as the trend.

The trajectory is modeled as an autocorrelated random walk along the centerline, clipped to the trunk surface — every sample sits on the tree, mimicking samples from a single video of a load being cooperatively transported.

Trajectory, fitted spline, perpendicular deviations

Brown trunk and branches are the underlying tree; orange dots are samples from one video, confined to the tree surface. Here t is the normalized trajectory parameter: t = 0 at the base of the trunk, t = 1 at the tip of the right branch, and t = 0.5 at the fork. The lateral coordinate follows an AR(1) random walk, with an elevated wobble in a window around t = 0.5 (the “deliberation” signal). Green curve is the cubic B-spline fitted to the samples via least squares. Red lines are perpendicular deviations di from each sample to its closest point on the spline.

Controls

draws a fresh AR(1) noise realization

What is DF? Degrees of freedom = number of B-spline basis functions. A single cubic (degree 3) polynomial has 4 free parameters (the coefficients of 1, t, t², t³), so DF = 4 corresponds to zero interior knots — one unbroken polynomial spanning the whole trajectory. Interior knots = DF − 4  (i.e. DF − (degree + 1)), so DF = 5 has 1 knot; DF = 6 has 2; DF = 14 has 10. Toggle Spline knots to see them marked.

Perpendicular residuals d(s) along the fitted spline

Signed perpendicular distance di against arc length si of the foot on the spline. For a Y-tree, the vertical orange line marks the arc-length location of the tree's fork projected onto the fitted spline. The tree's fork is at a fixed geometric point, but its arc-length coordinate along the spline shifts slightly with each refit because the spline itself shifts. Dashed gray lines indicate ±1σ of the residual estimated away from the deliberation window.

What the residual is telling us

If the spline captures the tree's shape — the smooth path the trajectory would follow without behavior — the residuals d(s) contain everything else: autocorrelated noise, drift, hesitation. Pre-fork wobble shows up as elevated residual variance around the fork.

The smoothing is a Goldilocks choice: too few degrees of freedom and the spline cuts across the bend, dumping tree shape into the residual; too many and it threads through the deliberation, hiding the signal. See the Choosing the DF tab.

On the smoothing parameter. For an ordinary least-squares fit to a cubic B-spline basis (what you get from splines::bs(t, df = k, degree = 3) followed by lm()), DF is the only knob — it sets the number of basis functions. Smoothing splines (e.g. mgcv::gam(y ∼ s(t))) take a different route: a large fixed basis plus a wiggliness penalty λ chosen by GCV or REML. DF and λ are two parameterizations of the same trade-off.

Detrending in R
library(splines)

# `ants` columns: t (parameter), x, y (load position).
# Fit cubic B-splines to x and y independently as functions of t.
df <- 6
fit_x <- lm(x ~ bs(t, df = df, degree = 3), data = ants)
fit_y <- lm(y ~ bs(t, df = df, degree = 3), data = ants)

# Dense evaluation of the curve r(t) = (x_hat, y_hat)
t_grid <- seq(min(ants$t), max(ants$t), length.out = 600)
curve  <- data.frame(
  t = t_grid,
  x = predict(fit_x, newdata = data.frame(t = t_grid)),
  y = predict(fit_y, newdata = data.frame(t = t_grid))
)

# Cumulative arc length along the curve
ds        <- sqrt(diff(curve$x)^2 + diff(curve$y)^2)
curve$s   <- c(0, cumsum(ds))
library(splines)
library(dplyr)
library(tibble)

df <- 6
fit_x <- lm(x ~ bs(t, df = df, degree = 3), data = ants)
fit_y <- lm(y ~ bs(t, df = df, degree = 3), data = ants)

curve <- tibble(t = seq(min(ants$t), max(ants$t), length.out = 600)) |>
  mutate(
    x = predict(fit_x, newdata = pick(t)),
    y = predict(fit_y, newdata = pick(t))
  ) |>
  mutate(
    dx = x - lag(x, default = first(x)),
    dy = y - lag(y, default = first(y)),
    s  = cumsum(sqrt(dx^2 + dy^2))
  ) |>
  select(-dx, -dy)

Geometry of the projection

d > 0 d < 0 r(t) = (x̂(t), ŷ(t)) θi τ(ti) di + pi r(ti) sample pi foot r(ti) signed scalar di fitted spline r(t)

The three computations

Each sample pi = (xi, yi) is mapped to two scalars: a signed deviation di from the fitted curve and an arc length si giving its position along the curve.

1. Find the foot. The nearest point on the curve is

ti  =  argmint∈[0,1]   ‖ pi − r(t) ‖

where ‖ · ‖ is the Euclidean distance — the ordinary straight-line distance √[(x1−x2)² + (y1−y2)²].

In practice we evaluate r(t) at a few hundred equally spaced parameter values and pick the index where the squared distance is minimized.

2. Compute the signed scalar distance. Let τ(ti) be the unit tangent at the foot and θi the angle from τ(ti) to (pi − r(ti)), measured counterclockwise as shown in the figure. Define the sign factor sgn(θi) by embedding both vectors in 3D (zero z-components) and extracting the z-component of their cross product:

sgn(θi)  =  (0,0,1)  ·  τ(ti) × (pi − r(ti)) ‖τ(ti)‖ · ‖pi − r(ti)‖

Because r(ti) is the nearest point on the curve to pi, the displacement (pi − r(ti)) is perpendicular to τ(ti), so sgn(θi) = ±1 exactly. The signed scalar distance is then:

di  =  ‖pi − r(ti)‖ · sgn(θi)

Positive on the left of the oriented curve, negative on the right — though this sign convention is arbitrary. Reversing the cross-product order to (pi − r(ti)) × τ(ti) flips the sign and labels the right side positive instead.

3. Tag with arc length. The companion coordinate is

si  =  ∫0ti ‖ r′(u) ‖ du

computed numerically as a cumulative sum of chord lengths over the same dense sequence of parameter values. Plotting di against si gives the residual signal used in the first tab.

Projection in R
# Project samples (x, y) onto a dense evaluation of the fitted
# spline (curve$x, curve$y) and compute the SCALAR d_i and s_i.

K <- nrow(curve)

# Unit tangent tau at each curve point (central difference)
i_prev <- pmax(seq_len(K) - 1, 1)
i_next <- pmin(seq_len(K) + 1, K)
tx <- curve$x[i_next] - curve$x[i_prev]
ty <- curve$y[i_next] - curve$y[i_prev]
mag <- sqrt(tx^2 + ty^2)
tx <- tx / mag;  ty <- ty / mag

# For each sample, find nearest curve point (foot)
project_one <- function(px, py) which.min((curve$x - px)^2 + (curve$y - py)^2)
i_foot <- mapply(project_one, ants$x, ants$y)

# Signed scalar distance: 2D cross product of unit tangent with gap vector.
# Equals ||p_i - r(t_i*)|| * sin(theta_i) where theta_i is the signed angle
# from tau to the displacement.  Positive on the left of the oriented curve.
d_i <- tx[i_foot] * (ants$y - curve$y[i_foot]) -
       ty[i_foot] * (ants$x - curve$x[i_foot])
s_i <- curve$s[i_foot]

resid <- data.frame(s = s_i, d = d_i)
library(dplyr)
library(purrr)

# Add unit tangent columns to the curve tibble
curve <- curve |>
  mutate(
    tx = lead(x, default = last(x)) - lag(x, default = first(x)),
    ty = lead(y, default = last(y)) - lag(y, default = first(y)),
    mag = sqrt(tx^2 + ty^2),
    tx  = tx / mag,
    ty  = ty / mag
  )

# Project each sample; d_i = ||p_i - r(t_i*)|| * sin(theta_i)
# computed directly as the 2D cross product tau x delta.
resid <- ants |>
  mutate(
    i_foot = map2_int(x, y, ~ which.min((curve$x - .x)^2 + (curve$y - .y)^2)),
    d = curve$tx[i_foot] * (y - curve$y[i_foot]) -
        curve$ty[i_foot] * (x - curve$x[i_foot]),
    s = curve$s[i_foot]
  )

The model-selection problem

DF and smoothness λ — two sides of the same coin. Ordinary least-squares fitting (used here and in splines::bs() + lm()) specifies DF, the number of basis functions, which directly fixes the number of interior knots and the flexibility of the curve. Penalized splines (e.g. mgcv::gam) instead start from a large fixed basis and add a wiggliness penalty λ · ∫|f′′(t)|² dt. Large λ suppresses oscillations (low effective DF); small λ allows high flexibility (high effective DF). Increasing DF ⇔ decreasing λ. The rest of this tab discusses DF-based selection; identical logic applies to λ.

Define the residual sum of squares between two parametric curves sampled at N common parameter values as

RSS(A, B)  ≜  ∑k=1N ‖ Ak − Bk2

Both quantities in the plot below compare the fitted spline to a reference set of N points evaluated at the same parameter values:

• RSS(spline, data) — spline points vs. sample positions. Decreases monotonically with DF (more flexibility always reduces in-sample error, reaching zero when DF ≥ N). It has no minimum, so it cannot choose DF on its own. RSS(spline, data) is shown not because it picks DF, but as a foil: its relentless decrease illustrates why a monotone fit criterion is insufficient and a complexity penalty is needed.

• RSS(spline, tree) — spline points vs. true tree-centerline positions (oracle). For a Y-shaped tree it is U-shaped: at low DF the spline cannot follow the fork geometry (underfitting); at high DF it chases trajectory noise away from the true shape (overfitting), and the minimum df* identifies the ideal smoothing level. For a straight (I-shaped) tree the geometry is linear — fully captured at df = 4 — so there is no underfitting regime and RSS(spline, tree) rises monotonically: higher DF only lets the spline drift further from the straight backbone.

How to use both curves together. The two curves diverge as DF grows: RSS(spline, data) keeps falling while RSS(spline, tree) passes its minimum and starts to rise. The onset of this divergence marks the overfitting threshold — at that DF, the spline begins fitting behavioral noise rather than tree geometry. Practically, choose DF in the neighborhood of df* (or slightly to its left for a conservative smoothing). The widening gap between the two curves tells you how much of the fit improvement is noise-fitting rather than tree-shape recovery: a narrow gap means most of the fit gain is genuinely structural; a wide gap means the spline is mostly chasing noise.

Can you get the tree skeleton from real video? Yes — the trunk centerline can be extracted by image segmentation or frame-to-tree registration. When that is possible, RSS(spline, tree) is directly usable, not just an oracle. When it is not available, AIC, BIC, and GCV replace the oracle: instead of measuring how far the spline is from the true skeleton, they penalize the in-sample RSS by the number of parameters, discouraging the spline from chasing noise.

• AIC = N log(RSS/N) + 2 DF (Akaike — lighter penalty, slightly favors richer models).
• BIC = N log(RSS/N) + log(N) DF (heavier penalty, sparser; usually preferable for large N). Both AIC and BIC are minimized over a grid of DF values.

• GCV from mgcv::gam avoids that grid search entirely by working in the λ domain: it fits a penalized spline and automatically chooses the smoothness λ to minimize an approximation of leave-one-out prediction error. Because λ and DF are equivalent parameterizations of the same flexibility/smoothness trade-off, GCV is solving the same model-selection problem — just without the sweep.

All three criteria enforce complexity penalties for the same reason: a spline with enough DF can pass through every sample (RSS = 0), but that spline is useless — it captures noise, not tree shape. The penalty mimics the U-shape of RSS(spline, tree) that arises for non-straight trees, without needing the true skeleton.

Spline overlay at multiple df

Splines fitted at several values of df overlaid on the same trajectory. The current df (from Tab 1's slider) is drawn in blue; the fixed reference values are gray. The dark-brown centerline running through the tree is the true skeleton backbone used as ground truth for RSS(spline, tree).

RSS vs df

Both curves are normalized to their own maximum so they share the y-axis. The dashed orange vertical line marks the current df; the dotted green line marks df* = argmin RSS(spline, tree). Hit Average over 30 seeds to smooth the noise realizations and see the underlying U-shape clearly.
Practical DF selection in R
library(splines)  # provides bs()
library(mgcv)

# bs(t, df = k, degree = 3) builds a cubic B-spline design matrix: an n x k
# matrix whose columns are B-spline basis functions evaluated at each t value.
# Passing it to lm() fits by OLS over that k-dimensional function space —
# the same least-squares cubic B-spline with k free parameters used throughout
# this widget.  Interior knots are placed at equally spaced quantiles of t;
# df = k = (degree + 1) + number_of_interior_knots = 4 + interior knots.

# Option 1: compare RSS(spline, data) vs RSS(spline, tree) across a DF grid.
# `skel` holds the tree-centerline (x, y) at each sample's t value.
# RSS(spline, data) always decreases; RSS(spline, tree) is U-shaped for Y-trees.
rss_df <- function(df) {
  fit_x <- lm(x ~ bs(t, df = df, degree = 3), data = ants)
  fit_y <- lm(y ~ bs(t, df = df, degree = 3), data = ants)
  xh <- predict(fit_x, newdata = ants)
  yh <- predict(fit_y, newdata = ants)
  c(data = sum((ants$x - xh)^2 + (ants$y - yh)^2),
    tree = sum((skel$x  - xh)^2 + (skel$y  - yh)^2))
}
rss_mat <- sapply(4:20, rss_df)
colnames(rss_mat) <- 4:20
df_star <- as.integer(colnames(rss_mat)[which.min(rss_mat["tree", ])])

# Option 2a: sweep DF, pick by AIC or BIC
ic <- sapply(4:20, function(df) {
  fx <- lm(x ~ bs(t, df = df, degree = 3), data = ants)
  fy <- lm(y ~ bs(t, df = df, degree = 3), data = ants)
  c(aic = AIC(fx) + AIC(fy),
    bic = BIC(fx) + BIC(fy))
})
colnames(ic) <- 4:20
df_aic <- as.integer(colnames(ic)[which.min(ic["aic", ])])
df_bic <- as.integer(colnames(ic)[which.min(ic["bic", ])])

# Option 2b: penalized cubic regression spline, smoothness by GCV
gx <- gam(x ~ s(t, bs = "cr"), data = ants, method = "GCV.Cp")
gy <- gam(y ~ s(t, bs = "cr"), data = ants, method = "GCV.Cp")
edf <- gx$edf + gy$edf   # effective DF from GCV
library(splines)  # provides bs()
library(mgcv)
library(dplyr)
library(purrr)
library(tibble)

# bs(t, df = k, degree = 3) constructs a cubic B-spline design matrix: k basis
# functions evaluated at each t, used by lm() as predictors for OLS fitting.
# This produces the same least-squares cubic B-spline (k free parameters) as
# the widget's interactive spline.  Interior knots fall at t quantiles;
# df = k = 4 + number_of_interior_knots.

# Option 1: compare RSS(spline, data) vs RSS(spline, tree) across a DF grid.
rss_sweep <- tibble(df = 4:20) |>
  mutate(
    fit_x = map(df, ~ lm(x ~ bs(t, df = .x, degree = 3), data = ants)),
    fit_y = map(df, ~ lm(y ~ bs(t, df = .x, degree = 3), data = ants)),
    rss_data = map2_dbl(fit_x, fit_y, ~ {
      xh <- predict(.x, newdata = ants)
      yh <- predict(.y, newdata = ants)
      sum((ants$x - xh)^2 + (ants$y - yh)^2)
    }),
    rss_tree = map2_dbl(fit_x, fit_y, ~ {
      xh <- predict(.x, newdata = ants)
      yh <- predict(.y, newdata = ants)
      sum((skel$x  - xh)^2 + (skel$y  - yh)^2)
    })
  )
df_star <- rss_sweep |> slice_min(rss_tree) |> pull(df)

# Option 2a: sweep DF, pick by AIC / BIC
ic <- rss_sweep |>
  mutate(
    aic = map2_dbl(fit_x, fit_y, ~ AIC(.x) + AIC(.y)),
    bic = map2_dbl(fit_x, fit_y, ~ BIC(.x) + BIC(.y))
  )
df_aic <- ic |> slice_min(aic) |> pull(df)
df_bic <- ic |> slice_min(bic) |> pull(df)

# Option 2b: GCV-chosen smoothness via mgcv::gam
gx  <- gam(x ~ s(t, bs = "cr"), data = ants, method = "GCV.Cp")
gy  <- gam(y ~ s(t, bs = "cr"), data = ants, method = "GCV.Cp")
edf <- gx$edf + gy$edf

Spectral detrending as an alternative

Instead of fitting a spline and subtracting it, we can move directly into the spatial-frequency domain. Let d⊥(z) be the perpendicular deviation from the tree centerline as a function of arc-length height z, resampled onto a uniform grid. (Using the raw x coordinate instead would mix in the lateral trend of the branch, creating a large DC component at branch heights — the perpendicular deviation is zero-mean by construction.) The trunk’s gentle curvature and the trajectory’s slow drift contribute power at low spatial frequencies; pre-fork deliberation contributes power at higher frequencies localized near the fork. A windowed Fourier transform of d⊥(z) exposes both at once.

The slow trajectory shape has some characteristic length scale Lbend. A cutoff at spatial frequency fcut = 1 / Lbend plays the same role as a spline detrend: power below the cutoff is “trend” and is ignored; power above the cutoff is candidate behavioral structure. Drag the Lbend slider to move the cutoff.

Why vertical = z. The spectrogram is plotted with the same vertical axis as the tree on the left, so you can read frequencies at any height by sliding your eye up the column.

Tree with samples

The current trajectory drawn on the tree. The lateral position x(z) used by the spectrogram on the right is the horizontal coordinate of these samples.

Spectrogram of d⊥(z) — lateral deviation

Magnitude of the windowed Fourier transform of the perpendicular deviation d⊥(z) from the tree centerline, as a function of arc-length height z. Using the perpendicular deviation (rather than raw x) removes the systematic lateral trend from the angled branch, leaving only the behavioral wobble. Vertical axis is height z (aligned to the tree on the left); horizontal axis is spatial frequency. The horizontal orange line marks z at the fork. The dashed white vertical line is the detrending cutoff fcut = 1/Lbend; power to its left is treated as trend.

Controls

What the controls do

Window length W — number of samples in each STFT slice. Larger W gives better frequency resolution but coarser localization in z; a Heisenberg-style trade-off. With K = 256 uniform z samples and W = 32, each slice spans roughly 12% of the trunk.

Hop — shift between consecutive windows. Smaller hop means more overlap (smoother spectrogram, more compute). Default hop = 4 with W = 32 gives 87.5% overlap.

Window typeHann and Hamming taper smoothly to (near) zero at the edges, suppressing spectral leakage from window boundaries; their side-lobe behavior differs slightly. Rectangular applies no taper — sharp edges leak energy across frequencies. Gaussian tapers smoothly and is optimal in the joint time-frequency uncertainty sense (Gabor's bound).

Lbend — characteristic length scale of slow trajectory shape (the trunk's bend, an ant's drift toward one side, anything you want to call “trend”). The cutoff fcut = 1/Lbend partitions the spectrum: below is trend, above is candidate signal.

log magnitude — switch from linear to log10(1+9v) scaling of the spectrogram so weaker components become visible. Per-slice normalization — rescale each z-slice to its own max, useful when overall power varies along the trunk.

Windowed Fourier transform in R
library(signal)

# Resample x as a function of height z onto a uniform grid.
# (y from the trajectory plays the role of trunk-aligned height.)
z_grid <- seq(min(ants$y), max(ants$y), length.out = 256)
xz     <- approx(ants$y, ants$x, xout = z_grid)$y
dz     <- diff(z_grid)[1]

# Windowed Fourier transform: Hann window, hop = W - overlap
W   <- 32
hop <- 4
spec <- specgram(xz, n = W, Fs = 1/dz,
                 window = hanning(W),
                 overlap = W - hop)

# Mark the fork height and the spatial-scale cutoff
z_fork <- 1                    # height of the geometric fork
L_bend <- 0.15                 # estimated bend length scale
f_cut  <- 1 / L_bend           # detrending cutoff in 1/length

# Plot with VERTICAL = z (transpose so z runs up the y-axis):
image(x = spec$f, y = spec$t, z = abs(spec$S),
      col   = hcl.colors(64, "viridis"),
      xlab  = "spatial frequency  (1/length)",
      ylab  = "height z")
abline(h = z_fork, col = "darkorange", lwd = 2)
abline(v = f_cut,  col = "white", lty = 2, lwd = 1.5)
library(signal)
library(tibble)
library(dplyr)
library(tidyr)
library(ggplot2)

z_grid <- seq(min(ants$y), max(ants$y), length.out = 256)
xz     <- approx(ants$y, ants$x, xout = z_grid)$y
dz     <- diff(z_grid)[1]

spec <- specgram(xz, n = 32, Fs = 1/dz,
                 window = hanning(32), overlap = 28)

spec_df <- expand_grid(
  z = spec$t,
  f = spec$f
) |>
  mutate(magnitude = as.numeric(abs(spec$S)))

# Vertical axis = z, horizontal axis = spatial frequency
ggplot(spec_df, aes(x = f, y = z, fill = magnitude)) +
  geom_raster() +
  geom_hline(yintercept = 1,        color = "darkorange", linewidth = 1.2) +
  geom_vline(xintercept = 1 / 0.15, color = "white",      linetype  = "dashed") +
  scale_fill_viridis_c() +
  labs(x = "spatial freq  (1/length)", y = "height z") +
  theme_minimal()
© 2026 Theodore P. Pavlic · MIT License