The n-back test is the working-memory task that cognitive neuroscientists ended up settling on. You watch a sequence — letters, positions on a grid, sounds — and respond every time the current one matches the one shown N steps back. With N=2, that means "is this the same as two ago?" With N=4, "is this the same as four ago?" The further back N gets, the more you have to hold in mind at once.
This page is a plain-English explainer. Where the test came from, how the modern computerized version works, what a "good" N level looks like, and the long, still-unsettled argument about whether training on it actually makes you any sharper outside the task.
What the n-back test measures
The n-back test measures working memory — specifically, the ability to maintain and continuously update a small set of items in active memory while doing something with them. Working memory is the cognitive workspace you use when you're following a conversation while holding the previous point in mind, doing mental arithmetic, or following a recipe.
Two operations matter for n-back performance:
- Maintenance — keeping the last N items live and accessible.
- Updating — dropping the oldest item when a new one arrives, and shifting everything else by one position in the queue.
The test loads both at the same time. As N grows, both the maintenance load and the updating frequency stack on top of each other, which is why performance drops sharply between N=3 and N=4 for most people.
Where the paradigm comes from
The n-back paradigm was introduced by Wayne Kirchner in a 1958 Journal of Experimental Psychology paper on age differences in short-term retention. Kirchner's original setup used a row of lights and a manual response panel; participants had to indicate the light position shown N steps earlier. The paper was small and largely overlooked at the time.
The paradigm got pulled into prominence forty years later, when neuroscientists looking for a working-memory task that worked cleanly inside an fMRI scanner adopted it. The n-back task was easy to instruct, ran in short blocks, parameterized neatly by N (so you could compare 1-back to 3-back within the same participant), and produced reliable activation in the prefrontal and parietal regions associated with working memory. Owen and colleagues' 2005 meta-analysis of n-back neuroimaging studies pulled together 24 fMRI experiments and confirmed the task's consistency as a neural probe for the working-memory network.
The other source you'll see cited a lot is Jaeggi et al.'s 2010 Memory paper on the n-back as a working-memory measure, which examined how well n-back performance correlates with other validated working-memory tasks. The short answer: n-back is a decent working-memory measure, but it correlates less strongly with classic complex-span tasks than people often assume. Treat the score as one window onto working memory, not as the working memory score.
How the modern test works
A typical computerized n-back session:
- You're shown a sequence of stimuli — letters in the visual version, spatial positions on a 3×3 grid in the spatial version, both at once in dual n-back.
- Each stimulus appears for roughly 500–700 ms, with a 2–3 second inter-stimulus interval.
- Your task: hit a key when the current stimulus matches the one shown N steps earlier.
- The test runs in blocks of fixed N. A session usually covers N=1 through N=3, sometimes higher for advanced versions.
Scoring uses one of two main approaches:
- d' (d-prime) — a signal-detection statistic combining hit rate (correct matches) and false-alarm rate (key presses on non-matches). Most research-grade scoring uses d'.
- Highest sustained N — the highest N level you can maintain at a chosen accuracy threshold (typically 80% or 90%). This is the score that most consumer-facing n-back apps report, because it's intuitive.
A research-grade n-back run usually takes 15 to 25 minutes. Consumer brain-training apps that include n-back typically use a shorter version with adaptive difficulty.
What "good" looks like
For untrained adults on a standard single n-back:
- N=1 is trivial — most people score near ceiling.
- N=2 is comfortable for most adults, with accuracy typically 85–95%.
- N=3 is where individual differences start showing. Accuracy is commonly 70–90% but more variable.
- N=4 is hard. Many adults plateau between N=3 and N=4 without practice.
- N=5 and above is rare without specific training and a particular knack for the task.
For dual n-back — visual and auditory streams running at the same time, with separate match keys for each — the same N feels meaningfully harder. Most adults plateau a level or two lower on dual n-back than on single n-back.
These ranges are rough. The literature is messy. Norms vary by age, by stimulus type (letters vs. positions vs. shapes), by the specific software used, and by how much practice the participant has had. Take any specific cut-off with skepticism.
The transfer debate — does n-back training generalize?
This is the part where people get the most excited and where the science gets the most contentious.
The original "yes, it transfers" claim came from a 2008 Jaeggi et al. study showing that dual n-back training improved fluid intelligence scores (Gf, measured on Raven's matrices) in healthy young adults. The result kicked off a decade of brain-training enthusiasm and a wave of consumer products built around n-back-style tasks.
The reality has been more complicated. The 2016 Psychological Science in the Public Interest review by Simons and colleagues — a 200-page evaluation of brain-training claims commissioned by the journal — concluded that:
- Training on a cognitive task reliably improves performance on that task. This is uncontroversial.
- Training on a task often improves performance on closely related tasks (near transfer). This is also reasonably well-supported.
- Whether training on n-back (or any other task) improves general cognitive ability (far transfer) is not well supported by the existing evidence. The studies that find it tend to be small, methodologically variable, and inconsistently replicated.
The honest reading of the 2026 literature is the same. N-back training will make you better at n-back. It will probably make you a little better at closely related working-memory tasks. Whether it transfers to "real-world thinking" is unresolved and probably not as strong as the original 2008 enthusiasm suggested.
Senwitt is not built on the broad-transfer claim. We built it on the much more defensible claim that skills you actively use stay easier to use than skills you stop using — see our research page on skill atrophy and our explainer on does brain training work for the long form.
How to interpret your own result
Three honest framings:
Your N is what it is on the day, with that software, in that mood. Single-session n-back results are particularly noisy compared to other working-memory tasks, partly because the task is so sensitive to attention lapses. One bad night of sleep can drop you a level.
Higher N is not the same as higher intelligence. Correlations between n-back performance and general cognitive ability exist, but they're modest. Plenty of high-IQ adults plateau at N=3. Plenty of average-IQ adults reach N=4 with practice.
Practice on n-back will improve your n-back score. This is so consistent it's almost an axiom of the task. Whether that practice carries over to non-test situations is, as above, contested.
Related Senwitt content
- The digit span test is the older, simpler working-memory task and the one used in the WAIS clinical battery.
- The working memory test is the broader pillar page that links these together.
- The Corsi block test is the spatial-working-memory cousin.
- For the broader argument about training and transfer, see the does brain training work page.
If you want a daily practice habit that exercises working memory alongside the other thinking skills — without the over-claims older brain-training apps were criticized for — that's what Senwitt is for: seven minutes a day, mixed across writing, math, code, memory, reading, and reasoning.
Dual n-back vs. single n-back — which to use?
Most consumer apps that brand themselves around n-back use the dual variant, partly because it's harder and feels more impressive, and partly because the famous 2008 Jaeggi paper used dual n-back specifically. In research, both versions are used; single n-back is more common in clinical and neuroimaging contexts because it isolates one stream of working-memory load at a time.
If you're using n-back to feel out what working-memory load is like, single is easier to instrument on and easier to feel improvement on. If you're using it as part of a brain-training routine where the goal is high effort, dual is more demanding per minute of practice.
The transfer evidence does not strongly distinguish between them. Both train n-back. Neither has been shown to robustly train general cognition.
A note on online n-back tests
Most browser-based n-back implementations are honest about what they are: a working-memory exercise that's interesting to do, with some adaptive difficulty so you stay around the edge of your performance level. They're fine for that.
What they are not — and what the better-designed ones don't claim to be — is a calibrated cognitive measurement instrument. Timing precision varies by browser, the specific stimuli affect difficulty, and there's no normative comparison group to score against. Use them to feel what working-memory load is like. Don't use them to make decisions about your cognitive trajectory.
