Skip to main content
working memory

N-Back Test — what it measures and how to read your score

The working-memory task cognitive psychologists actually use. Where it came from, how scoring works, and the long debate about whether training on it transfers to general cognitive ability.

Paradigm · Working memory — Kirchner, 1958

Updated Reviewed by Senwitt Editorial Team

What is the n-back test?

The n-back test asks you to monitor a sequence of stimuli and respond whenever the current item matches the one shown N steps earlier. With N=1 it's easy; with N=3 or N=4 it gets hard fast. The test was introduced by Wayne Kirchner in 1958 as a measure of working memory load, and it has become the most-used working-memory task in cognitive neuroscience research. Single n-back uses one stimulus stream (usually letters or positions). Dual n-back uses two simultaneous streams. Your score is the level of N you can sustain accurately; typical adults plateau between N=2 and N=4.

The n-back test is the working-memory task that cognitive neuroscientists ended up settling on. You watch a sequence — letters, positions on a grid, sounds — and respond every time the current one matches the one shown N steps back. With N=2, that means "is this the same as two ago?" With N=4, "is this the same as four ago?" The further back N gets, the more you have to hold in mind at once.

This page is a plain-English explainer. Where the test came from, how the modern computerized version works, what a "good" N level looks like, and the long, still-unsettled argument about whether training on it actually makes you any sharper outside the task.

What the n-back test measures

The n-back test measures working memory — specifically, the ability to maintain and continuously update a small set of items in active memory while doing something with them. Working memory is the cognitive workspace you use when you're following a conversation while holding the previous point in mind, doing mental arithmetic, or following a recipe.

Two operations matter for n-back performance:

  • Maintenance — keeping the last N items live and accessible.
  • Updating — dropping the oldest item when a new one arrives, and shifting everything else by one position in the queue.

The test loads both at the same time. As N grows, both the maintenance load and the updating frequency stack on top of each other, which is why performance drops sharply between N=3 and N=4 for most people.

Where the paradigm comes from

The n-back paradigm was introduced by Wayne Kirchner in a 1958 Journal of Experimental Psychology paper on age differences in short-term retention. Kirchner's original setup used a row of lights and a manual response panel; participants had to indicate the light position shown N steps earlier. The paper was small and largely overlooked at the time.

The paradigm got pulled into prominence forty years later, when neuroscientists looking for a working-memory task that worked cleanly inside an fMRI scanner adopted it. The n-back task was easy to instruct, ran in short blocks, parameterized neatly by N (so you could compare 1-back to 3-back within the same participant), and produced reliable activation in the prefrontal and parietal regions associated with working memory. Owen and colleagues' 2005 meta-analysis of n-back neuroimaging studies pulled together 24 fMRI experiments and confirmed the task's consistency as a neural probe for the working-memory network.

The other source you'll see cited a lot is Jaeggi et al.'s 2010 Memory paper on the n-back as a working-memory measure, which examined how well n-back performance correlates with other validated working-memory tasks. The short answer: n-back is a decent working-memory measure, but it correlates less strongly with classic complex-span tasks than people often assume. Treat the score as one window onto working memory, not as the working memory score.

How the modern test works

A typical computerized n-back session:

  1. You're shown a sequence of stimuli — letters in the visual version, spatial positions on a 3×3 grid in the spatial version, both at once in dual n-back.
  2. Each stimulus appears for roughly 500–700 ms, with a 2–3 second inter-stimulus interval.
  3. Your task: hit a key when the current stimulus matches the one shown N steps earlier.
  4. The test runs in blocks of fixed N. A session usually covers N=1 through N=3, sometimes higher for advanced versions.

Scoring uses one of two main approaches:

  • d' (d-prime) — a signal-detection statistic combining hit rate (correct matches) and false-alarm rate (key presses on non-matches). Most research-grade scoring uses d'.
  • Highest sustained N — the highest N level you can maintain at a chosen accuracy threshold (typically 80% or 90%). This is the score that most consumer-facing n-back apps report, because it's intuitive.

A research-grade n-back run usually takes 15 to 25 minutes. Consumer brain-training apps that include n-back typically use a shorter version with adaptive difficulty.

What "good" looks like

For untrained adults on a standard single n-back:

  • N=1 is trivial — most people score near ceiling.
  • N=2 is comfortable for most adults, with accuracy typically 85–95%.
  • N=3 is where individual differences start showing. Accuracy is commonly 70–90% but more variable.
  • N=4 is hard. Many adults plateau between N=3 and N=4 without practice.
  • N=5 and above is rare without specific training and a particular knack for the task.

For dual n-back — visual and auditory streams running at the same time, with separate match keys for each — the same N feels meaningfully harder. Most adults plateau a level or two lower on dual n-back than on single n-back.

These ranges are rough. The literature is messy. Norms vary by age, by stimulus type (letters vs. positions vs. shapes), by the specific software used, and by how much practice the participant has had. Take any specific cut-off with skepticism.

The transfer debate — does n-back training generalize?

This is the part where people get the most excited and where the science gets the most contentious.

The original "yes, it transfers" claim came from a 2008 Jaeggi et al. study showing that dual n-back training improved fluid intelligence scores (Gf, measured on Raven's matrices) in healthy young adults. The result kicked off a decade of brain-training enthusiasm and a wave of consumer products built around n-back-style tasks.

The reality has been more complicated. The 2016 Psychological Science in the Public Interest review by Simons and colleagues — a 200-page evaluation of brain-training claims commissioned by the journal — concluded that:

  • Training on a cognitive task reliably improves performance on that task. This is uncontroversial.
  • Training on a task often improves performance on closely related tasks (near transfer). This is also reasonably well-supported.
  • Whether training on n-back (or any other task) improves general cognitive ability (far transfer) is not well supported by the existing evidence. The studies that find it tend to be small, methodologically variable, and inconsistently replicated.

The honest reading of the 2026 literature is the same. N-back training will make you better at n-back. It will probably make you a little better at closely related working-memory tasks. Whether it transfers to "real-world thinking" is unresolved and probably not as strong as the original 2008 enthusiasm suggested.

Senwitt is not built on the broad-transfer claim. We built it on the much more defensible claim that skills you actively use stay easier to use than skills you stop using — see our research page on skill atrophy and our explainer on does brain training work for the long form.

How to interpret your own result

Three honest framings:

Your N is what it is on the day, with that software, in that mood. Single-session n-back results are particularly noisy compared to other working-memory tasks, partly because the task is so sensitive to attention lapses. One bad night of sleep can drop you a level.

Higher N is not the same as higher intelligence. Correlations between n-back performance and general cognitive ability exist, but they're modest. Plenty of high-IQ adults plateau at N=3. Plenty of average-IQ adults reach N=4 with practice.

Practice on n-back will improve your n-back score. This is so consistent it's almost an axiom of the task. Whether that practice carries over to non-test situations is, as above, contested.

If you want a daily practice habit that exercises working memory alongside the other thinking skills — without the over-claims older brain-training apps were criticized for — that's what Senwitt is for: seven minutes a day, mixed across writing, math, code, memory, reading, and reasoning.

Dual n-back vs. single n-back — which to use?

Most consumer apps that brand themselves around n-back use the dual variant, partly because it's harder and feels more impressive, and partly because the famous 2008 Jaeggi paper used dual n-back specifically. In research, both versions are used; single n-back is more common in clinical and neuroimaging contexts because it isolates one stream of working-memory load at a time.

If you're using n-back to feel out what working-memory load is like, single is easier to instrument on and easier to feel improvement on. If you're using it as part of a brain-training routine where the goal is high effort, dual is more demanding per minute of practice.

The transfer evidence does not strongly distinguish between them. Both train n-back. Neither has been shown to robustly train general cognition.

A note on online n-back tests

Most browser-based n-back implementations are honest about what they are: a working-memory exercise that's interesting to do, with some adaptive difficulty so you stay around the edge of your performance level. They're fine for that.

What they are not — and what the better-designed ones don't claim to be — is a calibrated cognitive measurement instrument. Timing precision varies by browser, the specific stimuli affect difficulty, and there's no normative comparison group to score against. Use them to feel what working-memory load is like. Don't use them to make decisions about your cognitive trajectory.

Frequently asked questions

Most untrained adults can sustain N=2 comfortably and N=3 with effort. Reaching N=4 takes either practice or a particular knack for the task. Plateauing at N=3 is completely normal and tells you nothing about your intelligence.

From Senwitt · advertisement

The text above is editorial. What follows is a promotional message from Senwitt, the maker of this site. Senwitt is a brain-exercise app and is not a medical product. Read the full disclaimer in the footer.

Sources

  1. 1.Age differences in short-term retention of rapidly changing information Journal of Experimental Psychology 55(4):352–358 (DOI 10.1037/h0043688), 1958.
  2. 2.N-back working memory paradigm: A meta-analysis of normative functional neuroimaging studies Human Brain Mapping 25(1):46–59 (DOI 10.1002/hbm.20131), 2005.
  3. 3.The concurrent validity of the N-back task as a working memory measure Memory 18(4):394–412 (DOI 10.1080/09658211003702171), 2010.
  4. 4.Do 'Brain-Training' Programs Work? Psychological Science in the Public Interest 17(3):103–186 (DOI 10.1177/1529100616661983), 2016.
Get the app

Take the daily practice the test is calibrated against.

Download on the App StoreGet it on Google Play

Free download. Super Senwitt available in-app.

We use cookies to make the site work, measure aggregate usage, and (if you opt in) attribute organic app installs. You can accept all, reject all, or customize.

See our cookie policy and privacy policy.