Behavioral scientists and engineers arrived at the same learning rule from different directions. In Rescorla–Wagner, associative strength updates in proportion to prediction error — the gap between reward received and reward predicted. In Q-learning, an agent maintains Q(s, a): the expected reward for choosing action a in situation s. The update rule is identical: Q(s,a) ← Q(s,a) + α·δ, where δ = r − Q(s,a). The formal connection is exact — Rescorla–Wagner is Q-learning without temporal discounting.
What Q-learning makes explicit is the state s — everything the agent observes before choosing. Standard associative learning models populate s with environmental cues alone. The Conspecific Cue Model (CCM; Gildea et al. 2025) proposes that co-presence of conspecifics at a food site is just another cue in s. This widget asks the same question using tabular Q-learning: each option's state is the observed social co-presence capped at the agent's counting limit. Each (option, perceived-count) pair gets its own Q-value, updated by the standard TD rule Q(s,a) ← Q(s,a) + α·δ. When limit=0 all options share state 0 and the agent is a pure individual learner with one Q-value per option — no weights, no gradient, no λ.
Each agent maintains its own Q-table and receives its own reward — no communication, shared reward, or centralized training. Social information enters only through the state bucket.
Two control conditions ask whether any group-training advantage requires specifically social information, or arises from having more cues of any kind.
Fixed dummies — k non-learning agents permanently assigned to fixed positions and never moving. By default they are split roughly equally between options, adding consistent but uninformative social cues. If fixed dummies produce curves similar to social agents, the group advantage is not unique to social information — any additional consistent state signal would do.
Random dummies — k non-learning agents that independently pick a random option each trial, uncorrelated with reward. Their shuffling corrupts the perceived count sa, diluting its reliability. Degraded performance confirms that the value of social cues depends on their consistency, not merely their presence.
| Q-learning (this widget) | CCM |
|---|---|
| Q(sa, a) | ΣV at option a |
| Expected reward / total associative strength | |
| Q(0, a) (s=0 bucket) | V(Ni) |
| Value at zero social context ≈ env-only strength | |
| Q(s, a) for s>0 | V(Ni)+V(Si) |
| Value at non-zero social context; interaction effects captured automatically | |
| counta | count of conspecifics |
| CCM notionally assigns a separate weight per individual; in practice all are interchangeable | |
| limit | βsoc/βnon (approx.) |
| Both govern social influence; here a perceptual ceiling, not a learning rate ratio | |
| α·δ (tabular update) | β[λ−ΣV] (R–W) |
| Identical prediction-error logic at γ=0; no separate weights or gradient | |
| Softmax (τ) | Horse-race timing |
| Both convert learned values to probabilistic choice | |
The Conspecific Cue Model (CCM; Gildea et al. 2025) was developed within the associative learning tradition, building on Rescorla–Wagner. But its mathematical structure is identical to a well-known class of reinforcement learning model — one that places it in direct dialogue with the tabular Q-learning model simulated in the other tabs of this widget.
Both approaches are variants of Q-learning. What distinguishes them is how the Q-function is represented — the prediction-error update rule and Bellman target are the same throughout. This gives a spectrum:
The tabular model explored in this widget encodes social context directly in the state, giving each (option, perceived-count) pair its own independent Q-value. CCM instead represents value as a learned weighted sum of environmental and social cues — exactly the Linear FA structure. The additivity assumption this entails (that V(N) and V(S) contribute independently) is precisely what CCM makes when it sums the two associative strengths. Recognizing this makes that assumption explicit and testable, and connects CCM to a large literature on when linear approximations succeed or fail.
In CCM, each option i is described by two observable quantities: Ni, the environmental cue (shape, light, etc.), and Si, the social co-presence signal (how many conspecifics are there). The animal learns a separate associative strength for each — V(N) and V(S) — and the total predicted value of an option is their sum:
Both weights start at zero and are updated every trial by the same prediction-error rule. This is standard Linear FA Q-learning — the weights are the animal's learned beliefs about how predictive each type of cue is, and those beliefs are revised whenever reality deviates from expectation.
| CCM (Gildea et al. 2025) | Linear FA / RL |
|---|---|
| Ni | env cue at option i |
| Always 1 in both models (cue is present or absent) | |
| Si | social co-presence at option i |
| Count of conspecifics present; same quantity, same role | |
| V(Ni) = wenv·Ni | wenv·Ni |
| Associative strength of the env cue — identical expressions | |
| V(Si) = wsoc·Si | wsoc·Si |
| Associative strength of social co-presence — identical expressions | |
| ΣV = V(N)+V(S) | Q(i) = wenv·Ni + wsoc·Si |
| Total predicted value — identical | |
| λ − ΣV | δ = r − Q(chosen) |
| Prediction error — identical; λ in R–W = reward magnitude r in RL | |
| βnon·δ·Ni | βnon·δ·Ni |
| Env weight update — identical; Ni=1 so reduces to βnon·δ | |
| βsoc·δ·Si | βsoc·δ·Si |
| Social weight update — identical; asymmetric rates are standard RL practice | |
| Horse-race timing rule | Softmax action selection |
| Both convert learned values into probabilistic choice | |
The correspondence is exact. CCM's βsoc/βnon asymmetry is not a structural departure from RL — separate learning rates per cue type are a standard implementation choice in Linear FA, often used when some cues are believed to be more volatile or salient than others.
Recognizing CCM as Linear FA immediately connects it to a large literature on when this class of model works well and when it doesn't — and clarifies how it differs from the tabular Q model simulated in the other tabs:
1. The additivity assumption and interaction effects. The CCM-analogous Linear FA model represents value as a weighted sum of independent cues. Environmental and social cues contribute independently, with no way to represent interactions. A single shared wsoc summarizes how reward-predictive social co-presence has been on average — it cannot learn that the same social cue means different things in different circumstances. The tabular Q model simulated in the other tabs does not have this constraint: each (option, social-count) pair gets its own independent Q-value, so interaction effects are representable in principle.
Consider the reversal task. During acquisition, many conspecifics at option A reliably signals reward. During reversal, the crowd is still at A for the first several trials — now as a misleading signal. A CCM/Linear FA agent must reverse the sign of wsoc to accommodate the flip, which is slow because the same weight is fighting against a well-learned history.
The tabular Q model in this widget handles the same situation differently. As the crowd shifts during reversal, the focal agent increasingly encounters social counts it rarely saw during acquisition — states with little learned history and therefore fast-updating Q-values. The redistribution of conspecifics effectively moves the agent into a novel region of experience, allowing the new contingency to be learned quickly. This is the tabular model's key structural advantage: each (option, social-count) pair is a genuinely independent context, not a single shared weight.
True interaction effects — where the value of a social cue depends on which environmental cue is present — are actually representable in the tabular Q model on the other tabs, because each (option, social-count) pair gets its own independent Q-value with no additivity constraint. CCM/Linear FA cannot represent such interactions because wsoc is a single number that applies regardless of context. In the current task design this distinction does not arise — the environmental cue is always present at each option, so there is no env-cue variation to interact with. But in a richer task where environmental cues varied across trials, the tabular model could learn that "social cues matter when the env cue is also present, but not otherwise" — a conjunction CCM cannot express. Whether animals represent such conjunctions is an empirical question the reversal paradigm could probe with appropriate modifications.
2. What "salience" means mechanically. In CCM, βsoc controls both how fast social associations are learned and how strongly they influence behavior — these are conflated in a single parameter. In the Linear FA framing they separate naturally: the learning rate governs update speed, while the weight magnitude governs influence on Q(i). The tabular Q model sidesteps this question entirely: there is no salience parameter, only a counting limit that determines what the animal can perceive. What gets learned follows from what gets observed.
3. The state augmentation question. Linear FA and CCM ask: given that social cues are present, how do their learning dynamics compare to environmental cues? The tabular Q model in this widget asks a more fundamental prior question: does having access to social context at all — at whatever granularity — change what the agent can learn? The counting limit manipulation probes this directly, asking how coarsely social context can be perceived before its benefits disappear. These are complementary questions, not competing ones.
For the core CCM prediction — a reversal advantage for collectively-trained animals — the two models agree qualitatively but for different reasons.
In CCM / Linear FA, the reversal advantage emerges from wsoc acting as an early-warning signal: socially-informed animals begin updating when they observe their conspecifics shifting toward the new correct option, before environmental prediction errors alone would drive learning. The asymmetric βsoc rate amplifies this effect.
In the tabular Q model, the reversal advantage depends on state-space structure. Social context provides discriminating states that shift as conspecifics redistribute after reversal, moving agents into less-familiar territory where Q-values update quickly. Coarser counting limits reduce this effect — predicting that animals with limited social numerosity show weaker reversal advantages, a prediction CCM does not make.
Both models predict that individually-trained animals (no social context) show slower reversal. But the tabular model additionally predicts a non-monotonic effect of social perception granularity: very fine-grained social perception can hurt reversal by over-specifying social states and building up rich history that must be overwritten (visible in the Learning Curves tab at limit=∞). CCM has no analog to this prediction.