Brain-Score Vision — the Metrics

Every scoring metric in brainscore_vision: its inputs, what it statistically does, its output, and what a high score says about brain-likeness.

A Brain-Score metric is just a function metric(model_output, brain_data) → Score in [0,1]. Each benchmark pairs a stimulus set + a brain/behavioral recording with one metric — so the metric is the operational definition of "brain-like" for that benchmark.

Reading a score: raw output ÷ ceiling (the same metric run on the data against itself). 1.0 means "as good as the data's own reliability allows," not "perfect." Almost everything is cross-validated: fit on 90% of stimuli, score the held-out 10%, repeat, take the median.

First-order vs second-order

Predictivity asks "can a linear readout recover the responses?" (needs a fit, rewards extractable information). RDM/CKA ask "is the relational geometry the same?" (no fit, invariant to rotation, blind to fit-recoverable info).

Performance vs strategy

Behavioral metrics form a ladder: matching accuracy < matching accuracy per condition < getting the same images right/wrong < matching the full confusion pattern. Higher rungs match the strategy, not just the score.

All Neural predictivity Representational similarity Behavioral Distribution / property Ceilings