Every scoring metric in brainscore_vision: its inputs, what it statistically does, its output, and what a high score says about brain-likeness.
A Brain-Score metric is just a function metric(model_output, brain_data) → Score in [0,1]. Each benchmark pairs a stimulus set + a brain/behavioral recording with one metric — so the metric is the operational definition of "brain-like" for that benchmark.
Reading a score: raw output ÷ ceiling (the same metric run on the data against itself). 1.0 means "as good as the data's own reliability allows," not "perfect." Almost everything is cross-validated: fit on 90% of stimuli, score the held-out 10%, repeat, take the median.
Predictivity asks "can a linear readout recover the responses?" (needs a fit, rewards extractable information). RDM/CKA ask "is the relational geometry the same?" (no fit, invariant to rotation, blind to fit-recoverable info).
Behavioral metrics form a ladder: matching accuracy < matching accuracy per condition < getting the same images right/wrong < matching the full confusion pattern. Higher rungs match the strategy, not just the score.