Survival & Recurrent-Event Analysis Visualizer

An interactive companion for rates, censoring, repeated transitions, state-transition models, and hypothesis testing

Survival analysis is often introduced as “time until a patient fails.” The same mathematics estimates transition rates in behavioral, ecological, and epidemiological systems: time until an animal leaves a state, time until a nest switches activity, or time until an infected individual recovers.

This widget starts with exponential waiting times and right censoring, generalizes to recurrent events where the same observational unit (for example, one focal animal, nest, colony, pair, or arena) can contribute several transitions during one observation period, and ends by testing whether an experimental factor changes the rate when the observations are not independent (fixed and random effects together). The word “unit” simply means whatever entity gets its own stopwatch.

Waiting times as transition rates

For a constant transition rate λ, the event time is exponential: the chance of still waiting at time t is S(t)=exp(−λt). The hazard is flat because the next instant does not “remember” how long the individual has already waited.

Controls

True rate λ 0.35

Observation window T 6.0

Units N 35

Random seed 1

events seen

—

uncensored observations

censored

—

still waiting at T

MLE λ

—

95% CI —

naive λ

—

ignores unfinished waits

How survival analysis uses unfinished waiting times

In an ordinary regression or mixed model, every response is observed in full. Survival analysis exists for the case where the response is a waiting time and some of those times are seen only in part. Give each observational unit (e.g., a focal animal, nest, colony, pair, or arena) its own clock that starts when it enters the at-risk state, and let W be the (random) waiting time until the transition of interest, with wᵢ its value for unit i. The complication is that we watch only until a fixed observation window T (the slider above). If a unit's transition lands inside the window (wᵢ ≤ T), we observe that value and record it. If the window closes first (wᵢ > T), the unit is right-censored: wᵢ itself is never seen, and we learn only that it exceeds the stopping time cᵢ, which in this synchronized design is the same window T for every unit. A censored unit is not missing data; it is a known lower bound on wᵢ, and the method is built around using that bound rather than discarding it.

The object we model is the survival function S(t)=P(W>t), the probability that the waiting time W exceeds a given time t (equivalently, that a unit is still waiting at t). The capital W is the random waiting time, whose realized value wᵢ varies from unit to unit; the lowercase t is just a particular time at which we evaluate that function. The survival curve S(t) equals 1 at t=0 and decreases as units transition out of the at-risk state.

This tab models each unit's waiting time as an independent draw from an exponential distribution with a single rate λ. That is the natural choice when the only feature we are willing to fix is a finite average wait 1/λ. The exponential then carries the fewest extra assumptions: it has no other parameters to set, and it has the greatest entropy of any distribution on the non-negative reals with that mean. Equivalently, the exponential is the only waiting-time distribution for which the probability of transitioning in the next short interval does not depend on how long a unit has already waited (i.e., it is memoryless). This is why it can be described by a single, constant hazard rate λ: a unit still at risk transitions at a steady λ per unit time, so its chance of transitioning in a short interval of length δ is about λδ, no matter how long it has already waited. The survival curve S(t)=exp(−λt) is then the probability that the transition occurs only after time t (that the unit is still waiting at t).

To estimate λ we score each candidate rate by how plausible it makes the data, then take the rate that maximizes that score. That plausibility is the probability of everything we observed, read as a function of λ, and about each unit we learned exactly one fact. For a censored unit i, the fact is that no transition had occurred by cᵢ (here equal to T for every censored unit), the event {W > cᵢ}, whose probability is the survivor value S(cᵢ) = exp(−λcᵢ) = exp(−λT). For an observed unit i, the fact is its transition time, but a continuous time is never measured to infinite precision: what we really know is that the transition fell in some narrow interval [wᵢ, wᵢ+Δt). The probability of that is the survival lost across the interval, S(wᵢ) − S(wᵢ+Δt), which for small Δt is f(wᵢ)·Δt with f(wᵢ) = λ exp(−λwᵢ). The same width Δt multiplies every observed unit and carries no λ, so it cancels when we compare candidate rates, leaving the density f(wᵢ) as that unit's contribution.

The score we maximize, as a function of λ, is the joint likelihood of all the units. Because each unit is independent of the others, this is a product of their contributions, a density for each observed unit and a survival probability for each censored one:

L(λ) = ∏_obs f(wᵢ) · ∏_cens S(cᵢ) = ∏_obs [λ exp(−λwᵢ)] · ∏_cens [exp(−λcᵢ)]

Every censored unit shares the same window, so cᵢ = T. Writing d for the number of observed events and m for the number censored (the same counts reported as "events seen" and "censored" above), each censored unit contributes exp(−λT) and the products collapse:

L(λ) = λ^d · exp(−λ · (Σ_obs wᵢ + m·T))

The data now enter only through two numbers: the event count d and the sum E = Σ_obs wᵢ + m·T (jointly the sufficient statistics for λ). Maximizing L is equivalent to maximizing its logarithm (the logarithm is monotonic, so both peak at the same λ), and the log is more convenient to differentiate because it turns the product and exponential into a sum. In particular, the log-likelihood ln L = d·ln λ − λE has derivative d/λ − E, which vanishes at the estimate

λ = d / (Σ_obs wᵢ + m·T), 1/λ = (Σ_obs wᵢ + m·T) / d

It reads most clearly through its reciprocal. An exponential's rate is one over its mean, so 1/λ estimates the mean waiting time. Averaging only the completed waits would give (Σ_obs wᵢ)/d, which is biased low: the censored units are precisely the ones whose waits ran past T, the longest in the sample, so dropping them shortens the average. Crediting each censored unit with the time T we did watch it wait, (Σ_obs wᵢ + m·T)/d, repairs that bias; this is the familiar mean time to failure, the total at-risk time per failure observed, and λ is its reciprocal. The truncated units are therefore not ignored: each one lengthens the estimated mean wait by the time T it was known to survive, and so lowers the rate. The naive estimator is exactly the uncorrected version, λ_naive = d / (Σ_obs wᵢ), the reciprocal of the biased-low average, which is why it overstates the rate; shortening the window T raises the censored fraction and drives the naive value further above the MLE reported above.

How sure is the estimate? A confidence interval

A point estimate needs a margin of error. On the log scale the rate estimate is approximately normal with a standard error that depends only on the number of observed events d, which gives the interval reported beside the MLE above:

log(λ) ± 1.96 / √d

Exponentiating both endpoints returns a positive interval for λ. Only the event count d sets its width: more observed events tighten it, while adding censored exposure (time at risk with no new events) does not. That is why heavy censoring can leave the rate loosely pinned even when many units were watched for a long time.

Reading the empirical step curve

The orange step curve in the plot is the Kaplan–Meier estimate. It recovers S(t) directly from the data without assuming any rate at all: starting at 1, it steps down at each observed transition in proportion to the share of the still-at-risk units that transition there, while a censored unit produces no step but leaves the risk set (so later steps are larger). Because it assumes no parametric form, it serves as a check on the exponential fit. Here the data are generated from a constant rate, so the step curve and the fitted exponential should track each other closely; on real data a persistent gap between them would warn that a single constant rate is too restrictive, a sign to relax the exponential assumption: with a richer parametric hazard if the rate itself is the goal, or with the Cox model below (which leaves the baseline shape unspecified) if the goal is the effect of covariates.

A different question: covariate effects with Cox proportional hazards

The exponential model answers what is the rate? Cox proportional-hazards regression answers a different question, and it helps to keep the two apart. Cox is built to estimate and test the relative effect of covariates on the hazard (does treatment raise or lower it, and by how much) without committing to any shape for the hazard over time. It keeps the survival machinery, including the treatment of censoring above, but it is an inference tool for effects rather than a rate estimator. It writes the hazard for a unit with covariate vector x as

h(t | x) = h₀(t) · exp(β·x)

Here x is a vector of covariates (e.g., age, treatment, body size, colony) and β a matching vector of coefficients, one per covariate. The factor exp(β·x) is just a positive number that scales the baseline hazard up or down; despite the notation it has nothing to do with the exponential waiting-time distribution above. The exponential link is only a convenient way to keep that multiplier positive and to let several effects combine multiplicatively, so each β_j reads as a log hazard ratio. The model is called proportional because any two covariate profiles keep a constant hazard ratio over time, exp(β·(x₁ − x₂)), the shared baseline having cancelled.

That cancellation is the point. Cox fits β from a partial likelihood that uses only the order in which events occur: at each transition it compares the unit that transitioned against those still at risk, and the unknown h₀(t) divides out. Censored units sit in the risk sets while they are watched and drop out once censored, exactly as above. What comes back are hazard ratios and tests such as H₀: β_j = 0 (does this factor move the hazard at all), which is usually the scientific question, e.g., treatment versus control.

Because the baseline is conditioned out rather than estimated, Cox says nothing about the absolute hazard or the transition rate, even when the waiting time really is exponential: there the constant rate λ is folded into the unspecified h₀(t) and never enters the partial likelihood. If the rate itself is what you want, fit the parametric exponential model (the MLE above); if you want to know which factors shift the hazard, use Cox. (A baseline hazard can be recovered after the fit by a separate estimator, but that step restores exactly the baseline-shape commitment Cox was designed to avoid.) The Kaplan–Meier step curve in the plot is the no-covariate end of the same idea: an assumption-free baseline, again with no rate attached.

Single-event Cox in R R

# observed time and status: 1 = event, 0 = right-censored
fit <- coxph(Surv(time, status) ~ treatment + size, data = subjects)

Survival analysis as a one-way transition

Examples

Time until first alarm call. Time until a forager leaves a patch. Time until an infected individual recovers (the I→R step of an SIR epidemic). Time until a colony relocates.

This is the form usually covered in introductory survival analysis: one event per subject, or one first event per subject. The SIR model is built entirely from transitions of this kind: S→I and I→R each run one way, with R absorbing (no return to S), so each is its own survival problem with its own rate. The SIS model in the next panel keeps the same machinery but lets a transition be returnable instead of absorbing, the single change that turns a one-way model into a recurrent one.

Recurrent events as returnable transitions: the SIS model

Why the SIS model? A template for recurrence

The diagram above is an SIS model: a unit cycles between susceptible (S, at risk of the event) and infected (I, not at risk), leaving S at rate β and returning from I at rate γ. It is the smallest model in which the focal event can happen more than once, which is what makes it the natural starting point here. Any process where the same unit can re-enter the at-risk state and trigger the event again has this two-state skeleton, whatever the states are actually called, so the SIS picture doubles as a high-level model of recurrence in general.

Richer models extend the same skeleton. A SIRS model inserts a temporary refractory or immune state R between I and the return to S (S→I→R→S), so re-exposure is delayed rather than immediate. Behavioral analogs of the recurrent S→I event include repeated bouts of activity, repeated vigilance or alarm responses, repeated nest maintenance, and repeated switches between task states; in each, "susceptible" is just the state currently eligible for the next transition.

Conceptual map

Model	Survival interpretation	Recurrent-event interpretation	Correct exposure denominator
S→I	Time to first transition	Not recurrent if I is absorbing	Time in S until event/censoring
SIS	First infection only discards later events	Repeated S→I transitions after I→S recovery	Total time in S across all episodes
SIRS	First infection ignores immune/refractory cycling	Events recur after S→I→R→S cycles	Total time in S; recovery/refractory rates use time in I/R
Behavioral state model	Time until first departure from a state	Repeated departures/returns during observation	Time eligible for the focal transition

Repeated transitions from the same observational unit

A continuous-time two-state process. Green segments are time in state S (at risk of an S→I event); orange segments are time in state I. The vertical fence is the observation window T (drag it): segments past it are faint and not counted, so sliding T changes how much of each process you observe without changing the process itself.

Controls

S→I rate β 0.35

I→S rate γ 0.70

Observation window T 14.0

Units N 24

Individual heterogeneity 0.00

Random seed 12

S→I events

—

all recurrent events

β recurrent

—

model-based 95% CI—

robust 95% CI—

β calendar

—

events / calendar time

β drop-censored

—

time in S, censored dropped

Why recurrent-event analysis matters

In compartmental models where loops are possible, as in the SIS epidemiological model, estimating the rate of a recurrent event such as the S→I transition requires special care. In the SIS case, the correct denominator for S→I is not total calendar time; it is total time spent at risk in state S. When a unit is already in state I, it cannot make another S→I transition until it returns to S.

β = # S→I transitions / total time in S

Censoring enters just as on the first tab. A unit still in S when the window closes has its current spell in S cut short at T; that truncated spell is genuine exposure (time at risk with no transition), so it stays in the denominator exactly as a censored wait does there, while adding nothing to the event count. The censoring is now measured against time in S rather than calendar time, and because the denominator already sums every S spell (completed and truncated), one unit can fold in several at once. The drop-censored monitor does the opposite: it discards every unit still in S at the close, throwing away the longest in-progress waits, which biases β upward, just as ignoring censoring does on the first tab. The interval estimate carries over unchanged, with d the number of S→I events: log(β) ± 1.96/√d, exponentiated, but only while the spells are conditionally independent given the rate, so that d events count as d independent observations even though they share units. That holds when units are exchangeable (Individual heterogeneity at zero); raising it makes a unit’s repeated events correlated and over-dispersed, the effective information then drops below d, and the true interval is wider than the one shown. The honest correction is the robust (cluster) variance of the Andersen–Gill fit below, or a frailty model. The robust 95% CI in the monitor is this cluster correction in action: it reads the dispersion off the between-unit spread of the per-unit contributions instead of assuming each event is one independent observation. The robust interval here carries the standard small-sample factor N/(N−1) (N = number of units) that de-biases the sandwich for few clusters; without it, the events that pin down β would leave the residual spread biased low by roughly (N−1)/N (their score contributions are forced to sum to zero), making the interval too tight at zero heterogeneity. With the correction the robust and model-based intervals coincide in expectation when units are exchangeable (up to seed-to-seed scatter), and the behavior worth reading is the robust interval widening past the model-based one once heterogeneity genuinely over-disperses the counts.

Comparing models: SIR, SIS, SIRS

Whether events should recur at all is itself a modeling choice, and competing choices can be compared by fit. The Akaike information criterion scores a fitted model by its maximized likelihood L (the likelihood evaluated at the fitted parameters), penalized for the number of free parameters k (here, just the number of transition rates listed below). Lower is better, so a richer model has to earn its extra parameters with enough improvement in L to offset the heavier penalty:

AIC = 2k − 2 ln L

Model	Recurrence	What it adds	Rate parameters (k)	What AIC weighs
S→I (absorbing)	One event per unit, then done	A single rate λ	1 (λ)	The single-event baseline from the first tab
SIS	S→I→S, events repeat	A return rate γ	2 (β, γ)	Do repeated transitions fit better than scoring only the first?
SIRS	S→I→R→S, with a refractory R	A refractory rate before re-exposure	3 (β, I→R, R→S)	Does a delay before returning to risk improve fit enough to justify the parameter?

Comparing recurrent-event rates across groups: counting-process Cox

The explorer's estimator (S→I events divided by total time in S) gives the rate within one homogeneous sample. To ask instead whether that rate differs across groups (e.g., treatment versus control), use the recurrent-event analog of the Cox model from the first tab. Laying the data out as one row per at-risk interval per unit on a common clock, with a single shared baseline intensity, and fitting it with coxph is the Andersen–Gill model: not a separate estimator, but the Cox partial likelihood applied to the recurrent-event counting process (each at-risk interval here is one spell in S). The cluster(id) term adds a robust variance, since Andersen–Gill otherwise assumes that a unit's repeated events carry no dependence beyond the covariates. As on the first tab, the fit returns hazard ratios (relative effects between groups), not the rate itself:

Recurrent-event Cox in R R

# one row per at-risk interval per unit
# columns: id, start, stop, event_SI  (1 = S->I at stop, else 0)
risk_time_S <- with(episodes, sum((stop - start)[state == "S"]))
beta_hat    <- sum(episodes$event_SI) / risk_time_S

# Cox version with covariates and within-unit clustering:
fit <- coxph(Surv(start, stop, event_SI) ~ treatment + cluster(id), data = episodes)

When that independence is unrealistic (the hazard drifts with event number, or units differ persistently), the usual alternatives keep the Cox machinery but rebuild the risk sets or baseline: the conditional Prentice–Williams–Peterson model (stratified by event order), the marginal Wei–Lin–Weissfeld model, or a shared-frailty random effect.

Hypothesis testing with mixed effects

The first two tabs were about estimation: given right-censored, and then recurrent, waiting times, recover the underlying transition rate without letting unfinished or repeated spells bias it. Comparing that rate across groups appeared only in passing, in the Cox proportional hazards box at the foot of each tab. This tab makes that comparison the point. The question shifts from “what is the rate?” to “does an experimental factor change it?”, which is a hypothesis test: a predicted effect on survival, an estimated effect size with an interval, and a verdict on whether the data distinguish that effect from no effect.

Real experiments rarely supply independent observations to test against. There is a fixed effect (the manipulated factor whose influence on the rate is being tested) and there is structure that was not manipulated but cannot be ignored: subjects sharing a cage, a clutch, a colony, or a measurement occasion, whose outcomes are correlated in ways no covariate captures. A fixed-effects-only fit reads that shared variation as independent evidence and reports more certainty than the design earns. The Cox proportional hazards model extends to carry both kinds of term, the same way the linear model extends to the linear mixed model: keep the fixed effect being tested and add a random effect (a frailty) for the clusters, so within-cluster correlation is modeled rather than ignored.

Consider a brood-survival experiment testing whether diet changes mortality, with two diets (high-protein and high-carbohydrate). Diet is assigned at the level of a nest box: several replicate boxes per diet, and within each box a cohort of brood is followed until death or until the observation window closes. Diet is the fixed effect under test; nest box is a random effect, because brood sharing a box share a microenvironment, a queen, and a rearing history, so their death times are not independent. The model the students would fit is coxme(Surv(time, status) ~ diet + (1 | nestID)), with diet as the fixed effect and a random intercept per box.

This is the clustering problem from the recurrent-event tab in a different guise. There the repeated events came from one individual, and treating them as independent overstated the information in the sample; the correction was a cluster-robust variance or a frailty term. Here the events do not recur (each brood item dies once), but the brood within a box are coupled in the analogous way, so the analogous correction applies. The replication the test can actually support is at the box level, not the brood level: pooling all brood as independent (the naive fit) is pseudoreplication, and the hypothesis test it reports comes out far more certain than the design earns.

Brood survival by diet, grouped within nest boxes

The vertical axis counts brood still alive (not a probability), so one nest box starts at its cohort size M and steps down by one at each death. Each diet's average survival is shown as mean brood alive per box (M times the pooled Kaplan–Meier estimate) and can carry one of two confidence bands: a pooled (Greenwood) band that treats brood as independent, or a nest-aware band that treats each box as the unit of replication. The vertical fence is the observation window T (drag it); deaths past it are censored and drawn faint. Toggle the layers at right.

True HR λ_C/λ_P

—

conditional, within nest

Mixed-effects HR

—

95% CI —

Mixed-effects p

—

σ = —

Pooled p

—

ignores nest (pseudoreplication)

Controls

High-protein hazard λ_P 0.30

High-carb hazard λ_C 0.60

Brood per box M 16

Boxes per diet R 8

Nest-to-nest SD σ 0.60

Observation window T 8.0

Random seed 19

Individual boxes Exponential fit Diet avg: pooled band Diet avg: nest-aware band

What to watch

Compare the two p-values in the monitors under the plot; the mixed-effects result is echoed in the plot title, which turns green with a star when it clears p < 0.05. Set the nest-to-nest SD to zero and the boxes are exchangeable: the mixed-effects and pooled fits agree, the estimated σ sits near zero, and the two bands become comparable (both then reflect only within-box sampling). Raise the SD and the pooled p shrinks toward significance while the mixed-effects p stays honest; turn both bands on to see the pooled band (narrow) nested inside the nest-aware band (wider). Set the two hazards equal (true HR = 1) at a moderate SD to watch the pooled fit declare a diet effect that is not there. Whether the survival bands overlap is not itself the test (pointwise bands can overlap even when the hazard-ratio test is significant); the verdict is the mixed-effects p.

The mixed-effects Cox fit in R

The fixed-effects models on the earlier tabs used coxph. When there is a single random effect, as with the shared nest boxes at the top of this page, coxph can be augmented with a frailty grouping to account for the pseudoreplication. For more complex random-effect structures, coxme from the package of the same name (both written by Therneau, who also wrote survival) adds a random intercept per cluster. The coxme random intercept is Gaussian on the log-hazard scale, which is a log-normal shared frailty acting on the baseline hazard of each box. Examples of both approaches are shown below, along with the random-effect-ignoring pooled fit that introduces pseudoreplication (for comparison only):

Mixed-effects Cox in R R

library(coxme)
# diet   : fixed effect (the hazard ratio of interest)
# nestID : random intercept for the box the brood were reared in

# single grouping factor: coxph's frailty() term fits essentially the
# same Gaussian shared-frailty model as coxme below and often
# converges more reliably:
frail <- coxph(Surv(time, status) ~ diet +
               frailty(nestID, distribution = "gaussian"),
               data = brood)

# alternative: use coxme instead of frailty groups
# allows for more complex random effects but less reliable convergence
model <- coxme(Surv(time, status) ~ diet + (1 | nestID), data = brood)

# for comparison: the pooled fixed-effects fit, shown for
# contrast, ignores the box (and introduces pseudoreplication):
pooled <- coxph(Surv(time, status) ~ diet, data = brood)   # pseudoreplication

Frailty term vs. coxme. For a single grouping factor, as here, the frailty() term and coxme's (1 | nestID) fit essentially the same Gaussian shared-frailty model and usually return a very similar hazard ratio and variance. The frailty() route is a penalized fit specialized to one variance component, so it is often more numerically robust and more likely to converge on awkward data; coxme uses a general integrated likelihood that also handles multiple, crossed, or nested random effects, at the cost of a harder optimization.

Conditional vs. marginal hazard ratio. The hazard ratio from coxme (or frailty()) is conditional on the nest: it compares two brood in boxes of the same random quality. The pooled coxph returns a marginal hazard ratio, averaged over boxes. The two differ even in principle, not only in their standard errors, because the hazard ratio is non-collapsible (the same reason a conditional and a marginal odds ratio differ); raising the nest-to-nest SD widens that gap along with the interval.

Read the interval, not the point estimate. With only a handful of boxes per diet, both the random-effect variance and the diet coefficient are estimated imprecisely, so the point estimates here are noisy. The lesson lives in the interval and the p-value (does modeling the nest change whether the diet effect is distinguishable from zero?), not in the exact hazard-ratio value.

The two confidence bands. The diet-average curves are pooled Kaplan–Meier estimates, which marginalize over the boxes within a diet. The pooled (Greenwood) band treats brood as independent, so, like the pooled coxph p-value, it is narrower than the nesting warrants. The nest-aware band uses the between-box spread of the per-box survival fractions (mean ± 1.96 × SD / √R); it widens with the nest-to-nest SD and with fewer boxes, approaches the pooled band when the boxes are exchangeable, and is the band to report (the band analog of a cluster-robust standard error). Either way the formal verdict is the mixed-effects p, not whether the bands overlap.

The exponential overlay. The optional exponential fit assumes a hazard constant in time (the assumption of the first two tabs). It is correct for this simulation, where waiting times are exponential, but most field survival data are fit nonparametrically (Kaplan–Meier) or with a Cox baseline because that assumption is rarely safe.