mario¶

Key facts¶


Subjects	5 — `sub-01` ✅ · `sub-02` ✅ · `sub-03` ✅ · `sub-04` ❌ · `sub-05` ✅ · `sub-06` ✅
Tasks	🎮 Super Mario Bros — in-scanner gameplay — 22 levels (excluding water and boss levels); discovery phase (all levels in order, unlimited attempts) then practice phase (random level per run); 3374 total level attempts across 5 participants; ~16.8 h fMRI per participant
Data	🧠 fMRI — 18.4 h/subject
	🎬 Naturalistic video — 18.4 h/subject
	🔊 Audio — 18.4 h/subject
	🎮 Gameplay — 18.4 h/subject
	🫀 ECG — 18.4 h/subject
	🫁 Respiration — 18.4 h/subject
	🫀 Pulse — 18.4 h/subject
	😓 Skin conductance (EDA) — 18.4 h/subject
	🕹️ Game logs (annotated events (.tsv) ✅, game replay (.bk2) ✅, video replay (.mp4) ✅, replay summary (.json) ✅, mapped RAM variables (.json) ✅, low-level features (.npy) ✅, scene annotations ✅)
	👁️ Eyetracking (gaze 🔒, pupillometry 🔒)
Assets	📁 BIDS · 🧠 fMRIPrep · 🫀 PhysPrep · 🗺️ Mario scenes

How to cite

Paugam, François, Pinsard, Basile, St-Laurent, Marie, Lajoie, Guillaume, Bellec, Lune (2025). Training neural networks from scratch in a videogame leads to brittle brain encoding. doi: 10.1101/2025.11.28.691119

Contributors

François Paugam 🔣📓 · RainyFields 🔣 · Basile 🎨📆🔣 · emilie-dessureault 🔣📆 · Yann Harel 🎨🔣📓 · Julie A. Boyle 🎨📆 · maelleF 🔣 · Marie-Eve Picard 🔣 · Motahareh 🔣 · Lune Bellec 🎨

🔣 data · 🎨 design · 📆 project management · 📓 user testing

Overview¶

The mario dataset contains in-scanner gameplay of Super Mario Bros. (Nintendo, 1985) for five CNeuroMod participants (sub-01, sub-02, sub-03, sub-05, sub-06). Participants played 22 of the original game’s levels across two phases — a structured discovery phase followed by a longer practice phase of randomly selected levels.

Prior gameplay experience varied across participants: sub-01 and sub-06 had previously played SMB; sub-01 and sub-02 were regular videogame players; sub-03 reported no prior videogame experience.

Game environment¶

Participants used the CNeuroMod fiber-optic MRI controller described in Harel et al. (2023). The game ran on a console emulator via OpenAI’s gym-retro, recorded at 60 Hz. Because the game is fully deterministic, only player inputs were stored; the .bk2 replay files allow exact reconstruction of every play.

Run design¶

We use run for a single fMRI acquisition and repetition for a single play of a level — from start to either completion or losing all three lives. Each repetition corresponds to exactly one .bk2 replay file. Each repetition began with no power-up and three lives; after death, the player resumed from the level start or from a checkpoint when one was available in the original level design.

The experiment was structured in two phases:

Discovery — every level was played in order, with unlimited attempts per level until at least one successful completion before moving to the next.
Practice — the remaining sessions used randomly selected levels for each repetition.

Levels¶

22 of the 32 original SMB levels were used. Water levels and boss levels were excluded because their mechanics differ substantially from the rest of the game.

Post-run questionnaire¶

At the end of each run, participants completed a short questionnaire including the items of the Flow Short Scale 2 (FSS-2), plus two additional items aimed at evaluating player fatigue and frustration. These two extra items were introduced after data collection had begun and are therefore absent from the earliest runs.

Per-subject summary¶

Subject	Repetitions (Discovery)	Repetitions (Practice)	Duration (Discovery)	Duration (Practice)	Success rate (Discovery)	Success rate (Practice)	Repetitions (Total)	Success rate (Total)	Duration (Total)
sub-01	230	567	03:54:27	09:47:11	0.578	0.781	797	0.723	13:41:38
sub-02	227	487	04:57:35	12:30:24	0.401	0.671	714	0.585	17:27:59
sub-03	176	451	04:49:38	11:57:19	0.432	0.698	627	0.624	16:46:57
sub-05	177	457	05:30:04	12:27:44	0.367	0.582	634	0.522	17:57:48
sub-06	134	468	04:25:41	13:37:40	0.627	0.857	602	0.806	18:03:22
Total	944	2430	23:37:27	60:20:19	0.481	0.718	3374	0.652	83:57:47

Event files and annotations¶

For each run, a _events.tsv file lists the timing of each repetition. A richer _desc-annotated_events.tsv file provides three categories of events:

button presses — every controller input;
in-game events — game-state annotations derived from RAM (kills, deaths, power-ups, etc.);
replay events — one entry per repetition with trial_type gym-retro_game, indicating which .bk2 replay was played at which onset.

Companion .bk2 replays, .mp4 videos, .json summaries, mapped RAM variables, and low-level visual features are provided alongside the events files.

In addition, the 22 levels are split into 313 short scenes annotated with 29 design patterns (23 from Dahlskog & Togelius, 2012, plus 6 contextual ones). See SCENES.md and the mario.scenes submodule for details and tooling to generate clip-level metadata, video, and memory dumps for each scene attempt.

Tutorials¶

The mario.tutorials repository provides a set of Colab-ready Jupyter notebooks that illustrate end-to-end use of the dataset on a single participant / single session, suitable for running on a laptop:

Dataset overview — exploration of the BIDS layout and behavioral annotations.
Event-based analysis — session-level GLM with hand-crafted action and game-event regressors and interpretable contrasts.
Reinforcement learning — training a CNN-based RL agent on the same gameplay and extracting layer activations.
Brain encoding — ridge-regression encoding models that map RL-agent activations onto BOLD signals, comparing layers.

The notebooks adapt and combine methodology from the shinobi_fmri and mario_generalization repositories.

Reference¶

A detailed description of the dataset and an associated modelling study are available in Paugam et al., bioRxiv 2025.