The Stroop test is one of the most replicated tasks in cognitive psychology. You name the ink color of a word as fast as you can. When the word itself names a different color — the word RED printed in blue ink — you slow down. That slowdown is the Stroop effect, and it has been measured in thousands of studies since John Ridley Stroop's 1935 paper in the Journal of Experimental Psychology.
This page is a plain-English explainer. What the Stroop test measures, where the paradigm came from, what good scores actually look like, and the parts cognitive psychologists are usually careful to add before someone interprets their result.
What the Stroop test actually measures
The Stroop test measures selective attention and what cognitive psychologists call interference control — the ability to focus on one feature of a stimulus (the ink color) while suppressing a competing, more automatic response (reading the word). Reading is so over-learned in literate adults that the brain produces the word's meaning faster than it produces the ink's color. The two signals collide. The cost of resolving that collision is what the test reads out.
Three trial types matter:
- Congruent — the word RED in red ink. Reading and naming agree. Fastest.
- Neutral — a non-color word (or color block with no word) in ink. No conflict. Baseline.
- Incongruent — the word RED in blue ink. Reading and naming disagree. Slowest.
The Stroop interference score is the difference (usually in milliseconds) between incongruent and neutral trial response times. A larger interference score means the conflict cost more time to resolve on that day, for that person, under those conditions.
Where the paradigm comes from
Stroop's original three experiments ran in the 1930s with paper cards. Participants named the color of color words; the slowdown when word and ink disagreed was the main effect. The paradigm is named for the experimenter, not the participants, and the 1935 paper is still the most-cited single piece of work in the history of attention research.
The 1991 review by Colin MacLeod, published in Psychological Bulletin, surveyed the first half-century of Stroop research. MacLeod identified the basic finding as one of the most robust in cognitive psychology — replicated across languages, scripts, ages, and presentation formats — and laid out the theoretical accounts that explain why the effect persists even when participants are trying hard to ignore the word. The short version: word-reading in literate adults is automatic in a way that color-naming is not, and automaticity is genuinely hard to switch off.
A 2017 Frontiers in Psychology review by Scarpina and Tagini covers the modern computerized version that most labs now use, including standard scoring procedures, the Golden version (paper-and-pencil), and the test's role in clinical research on conditions where executive function is at issue.
How the modern computerized test works
A typical computerized Stroop test runs around 100–200 trials in blocks. Each trial:
- A color word appears on screen, printed in one of four to six possible ink colors.
- You respond — by key press or by speaking aloud — with the ink color, not the word.
- The software records response time and accuracy.
Trials are mixed across the three types (congruent, neutral, incongruent). The test usually takes 5 to 15 minutes including instructions and practice trials.
Two scores come out of it:
- Reaction time interference — the millisecond gap between incongruent and neutral trials.
- Accuracy interference — the proportion of incongruent trials you got wrong.
The reaction time score is the one that gets reported most often. The accuracy score is treated as a secondary check; if you're sacrificing accuracy for speed, the RT number reads cleaner than it should.
What "good" looks like — and why that question is harder than it sounds
The honest answer is: there is no single "good" Stroop score. Norms vary by:
- Age. Response times generally lengthen across adulthood, and interference scores often (not always) widen. Older adults aren't worse at attention per se — they are slower at everything, so the absolute RT gap looks larger even when the proportional cost is similar.
- Language. The Stroop effect is bigger in languages where color words are short and frequent. Norms from English research don't transfer directly to other languages.
- Test version. Paper-and-pencil Stroop (Golden) and computerized Stroop yield different absolute numbers. They correlate, but a 200 ms interference on one isn't the same as 200 ms on the other.
- State factors. Fatigue, caffeine, time of day, anxiety, and how recently you practiced all move the score. A single Stroop run is a snapshot, not a stable trait.
Across the literature, typical adult interference scores on a computerized 4-color Stroop fall roughly in the 100–250 ms range, with substantial individual variation. Anything outside that range on a single sitting could reflect a real effect, or could reflect that you're tired, distracted, or doing the task on a phone screen too small for the words. Treat the absolute number with humility.
How to interpret your own result, honestly
Three things to keep in mind before you read too much into one Stroop run.
One run is a measurement, not a verdict. Cognitive psychologists who use the Stroop test in research average across hundreds of trials per person and often across multiple sessions, because single-session noise is large. Doing a five-minute Stroop test in your browser and concluding anything about "my attention" is the equivalent of running for two minutes and deciding your fitness.
The test does not measure intelligence. Stroop interference correlates loosely with measures of executive function and working memory, but the correlation is modest. People with very different IQ scores can have similar interference scores. People with similar interference scores can have very different cognitive profiles overall.
The test is not a diagnostic tool on its own. The Stroop test is used in clinical research on conditions where executive function is studied — for example, ADHD or traumatic brain injury research — but it is never used as a stand-alone diagnostic. Clinicians use it as one input among many, alongside structured interviews, history, and other neuropsychological tests. Your at-home Stroop result is not a clinical signal about anything.
Related Senwitt content
The Stroop test is a measurement instrument. Senwitt is a daily practice habit. They live on different sides of the same conversation.
- For the broader topic of attention and selective focus, see our reading skill page and the research note on cognitive offloading.
- The Flanker test is the Stroop's closest cousin — also a selective-attention task, also driven by Eriksen — and the two are often used together.
- For the working-memory side of executive function rather than the attention side, see the N-back test and the working memory test pages.
If you want a daily practice habit that keeps the underlying skills — attention, working memory, reading focus — in regular use without trying to measure them, that's what Senwitt is for. The MIT cognitive-debt research and the workplace data on AI brain fry both point at the same risk: skills you stop using get harder to use. Senwitt is the small, deliberate, daily counter-pressure.
The history of "the Stroop effect" as a phrase
The phrase "Stroop effect" did not exist in Stroop's 1935 paper. Stroop himself simply called the phenomenon "interference" and treated it as a window into the automaticity of reading. The eponym appeared in the literature over the next two decades as the paradigm spread, and was firmly cemented by the time MacLeod surveyed the field in 1991.
It is one of the rare cases in cognitive psychology where the experimenter's name has become genuinely synonymous with the paradigm. Most researchers refer simply to "the Stroop task" or "the Stroop effect" without explanation, expecting any reader with undergraduate-level training to understand what is meant.
Variants you may encounter
The basic color-word Stroop is the canonical version, but the same logic — over-learned response interfering with a less-automatic one — has been ported into many variants:
- Emotional Stroop — neutral and emotionally loaded words; participants name ink color. Used in research on anxiety and PTSD. Slowing on threat-related words is the effect of interest.
- Numerical Stroop — number value vs. physical size. The number 7 printed larger than the number 9 produces interference if asked to judge size.
- Spatial Stroop — the word LEFT presented on the right side of the screen.
- Animal Stroop (used with children) — an animal picture with the wrong animal sound or name. Lets the paradigm work before reading is fully automatic.
All of these variants share the core idea: two competing response tendencies, one more automatic than the other, with the experimenter measuring the cost of resolving the conflict.
A note on online Stroop tests
A lot of "take the Stroop test online" pages exist. Most of them are fine for getting a feel for the paradigm. They are not fine for drawing conclusions about your cognitive abilities. The reasons are mundane: the timing on browser-rendered tests can be off by tens of milliseconds depending on your monitor's refresh rate, your input device's polling rate, and whatever else your browser is doing. None of those things matter when you're just trying to experience the Stroop effect. They matter a lot when you're trying to compare your number against a published norm.
If you want to feel the effect — and feeling it is genuinely worthwhile, because it makes the automaticity-of-reading point visceral in a way a research paper can't — any of the online versions will do. If you want a number you can hang anything on, you'd need a properly calibrated test under controlled conditions.
