# mario ## Overview The `mario` dataset contains in-scanner gameplay of *Super Mario Bros.* (Nintendo, 1985) for five CNeuroMod participants (`sub-01`, `sub-02`, `sub-03`, `sub-05`, `sub-06`). Participants played 22 of the original game's levels across two phases — a structured **discovery** phase followed by a longer **practice** phase of randomly selected levels. Prior gameplay experience varied across participants: `sub-01` and `sub-06` had previously played SMB; `sub-01` and `sub-02` were regular videogame players; `sub-03` reported no prior videogame experience. ## Game environment Participants used the CNeuroMod fiber-optic MRI controller described in [Harel et al. (2023)](https://doi.org/10.1371/journal.pone.0290158). The game ran on a console emulator via OpenAI's [gym-retro](https://github.com/openai/retro), recorded at 60 Hz. Because the game is fully deterministic, only player inputs were stored; the `.bk2` replay files allow exact reconstruction of every play. ## Run design We use **run** for a single fMRI acquisition and **repetition** for a single play of a level — from start to either completion or losing all three lives. Each repetition corresponds to exactly one `.bk2` replay file. Each repetition began with no power-up and three lives; after death, the player resumed from the level start or from a checkpoint when one was available in the original level design. The experiment was structured in two phases: - **Discovery** — every level was played in order, with unlimited attempts per level until at least one successful completion before moving to the next. - **Practice** — the remaining sessions used randomly selected levels for each repetition. ## Levels 22 of the 32 original SMB levels were used. Water levels and boss levels were excluded because their mechanics differ substantially from the rest of the game. ## Post-run questionnaire At the end of each run, participants completed a short questionnaire including the items of the **Flow Short Scale 2 (FSS-2)**, plus two additional items aimed at evaluating player fatigue and frustration. These two extra items were introduced after data collection had begun and are therefore absent from the earliest runs. ## Per-subject summary | Subject | Repetitions (Discovery) | Repetitions (Practice) | Duration (Discovery) | Duration (Practice) | Success rate (Discovery) | Success rate (Practice) | Repetitions (Total) | Success rate (Total) | Duration (Total) | |-----------|------------------------:|-----------------------:|---------------------:|--------------------:|-------------------------:|------------------------:|--------------------:|---------------------:|-----------------:| | sub-01 | 230 | 567 | 03:54:27 | 09:47:11 | 0.578 | 0.781 | 797 | 0.723 | 13:41:38 | | sub-02 | 227 | 487 | 04:57:35 | 12:30:24 | 0.401 | 0.671 | 714 | 0.585 | 17:27:59 | | sub-03 | 176 | 451 | 04:49:38 | 11:57:19 | 0.432 | 0.698 | 627 | 0.624 | 16:46:57 | | sub-05 | 177 | 457 | 05:30:04 | 12:27:44 | 0.367 | 0.582 | 634 | 0.522 | 17:57:48 | | sub-06 | 134 | 468 | 04:25:41 | 13:37:40 | 0.627 | 0.857 | 602 | 0.806 | 18:03:22 | | **Total** | **944** | **2430** | **23:37:27** | **60:20:19** | **0.481** | **0.718** | **3374** | **0.652** | **83:57:47** | ## Event files and annotations For each run, a `_events.tsv` file lists the timing of each repetition. A richer `_desc-annotated_events.tsv` file provides three categories of events: - **button presses** — every controller input; - **in-game events** — game-state annotations derived from RAM (kills, deaths, power-ups, etc.); - **replay events** — one entry per repetition with `trial_type` `gym-retro_game`, indicating which `.bk2` replay was played at which onset. Companion `.bk2` replays, `.mp4` videos, `.json` summaries, mapped RAM variables, and low-level visual features are provided alongside the events files. In addition, the 22 levels are split into 313 short **scenes** annotated with 29 design patterns (23 from Dahlskog & Togelius, 2012, plus 6 contextual ones). See {doc}`SCENES.md ` and the [`mario.scenes`](https://github.com/courtois-neuromod/mario.scenes) submodule for details and tooling to generate clip-level metadata, video, and memory dumps for each scene attempt. ## Tutorials The [`mario.tutorials`](https://github.com/courtois-neuromod/mario.tutorials) repository provides a set of Colab-ready Jupyter notebooks that illustrate end-to-end use of the dataset on a single participant / single session, suitable for running on a laptop: 1. **Dataset overview** — exploration of the BIDS layout and behavioral annotations. 2. **Event-based analysis** — session-level GLM with hand-crafted action and game-event regressors and interpretable contrasts. 3. **Reinforcement learning** — training a CNN-based RL agent on the same gameplay and extracting layer activations. 4. **Brain encoding** — ridge-regression encoding models that map RL-agent activations onto BOLD signals, comparing layers. The notebooks adapt and combine methodology from the [`shinobi_fmri`](https://github.com/courtois-neuromod/shinobi_fmri) and [`mario_generalization`](https://github.com/courtois-neuromod/mario_generalization) repositories. ## Reference A detailed description of the dataset and an associated modelling study are available in [Paugam et al., bioRxiv 2025](https://doi.org/10.1101/2025.11.28.691119).