Why a ten-second delay kills your child's learning

My son Remi sounded out the word "frog." Then he waited. Fourteen seconds later, the app finally said "Great job!" By then Remi was upside down on the couch asking if frogs have teeth. The praise landed on an empty seat.

That fourteen-second gap is not a user-experience nit. It is a learning-science problem. For a four-year-old, fourteen seconds is longer than the entire window in which their brain can hold the sound they just made, the letter that produced it, and the feedback that tells them whether it was right. By the time the app speaks, the loop has broken. The lesson is not slower. It is gone.

This piece is about that gap, what the research says about it, and why latency — not content, not curriculum, not even gamification — is the variable most legacy reading apps got wrong.

What working memory looks like at four

Working memory is the mental scratchpad where a child holds information just long enough to do something with it: sound out a word, follow a two-step instruction, compare two shapes. It is not the same as long-term memory and it is not the same as attention. It is the bottleneck that sits between them.

Nelson Cowan's research on the focus of attention in young children puts the realistic capacity of a 4–7-year-old's working memory at roughly three to four "chunks" of information, held for somewhere between 15 and 30 seconds without active rehearsal (Cowan, 2010, Current Directions in Psychological Science). Susan Gathercole and Tracy Alloway's work on working-memory development shows the same age band hovering near the low end of adult capacity, with steep individual variation tied to language exposure and reading readiness (Gathercole and Alloway, Working Memory and Learning, 2008).

Translate that into a phonics moment. A child hears the prompt, holds the target sound, tries to produce it, and waits for feedback. Every second of that wait is unrehearsed time, and unrehearsed time decays the chunk. By ten seconds the child is rehearsing nothing because their brain has moved on to the dog walking past the window. By fifteen seconds, when the praise finally arrives, it is no longer attached to the action that earned it.

Adults do this same calculation unconsciously. We can hold a phone number for the time it takes to dial it, but if someone interrupts us mid-dial we lose it. Now imagine that window is half as wide and there is a glowing tablet trying to fill it with a spinner.

Cognitive load is not abstract — it is felt

Cognitive load theory, originating with John Sweller in the late 1980s and refined since, splits the load a learner carries into three buckets: the load intrinsic to the task, the extraneous load added by how the task is presented, and the germane load that actually builds skill. A spinner is pure extraneous load. The child is doing work — staying focused, holding the sound, suppressing the impulse to flip over — but none of that work is going toward learning. It is going toward waiting.

This is why "just add a fun animation while it loads" does not solve the problem. The animation does not reduce extraneous load. It adds to it. The child is now tracking the animation and trying to remember what they were doing before the animation started.

What the response-time data actually shows

I pulled out a stopwatch and tested the apps Remi and his friends were using. I tested each one with the same simple phonics prompt and timed from the moment the child finished speaking or tapping to the moment the app produced its substantive response. I ran each test five times and took the median on a stable home network. These are my measurements, not the vendors' published figures, and they will vary with device and connection.

Platform	Interaction type	Median response time	What the child is doing while they wait
Lexia Core5 (web)	Tap, then audio feedback	8–14 seconds	Looking around, sometimes tapping again, occasionally giving up
IXL (web, phonics)	Tap multiple-choice	3–6 seconds	Holding the target, often re-reading the prompt
Khan Academy Kids	Tap or trace	2–4 seconds	Engaged but losing momentum on multi-step prompts
ABCmouse	Tap, then animation	4–8 seconds	Watching animation, losing the original task thread
YouTube Kids (passive)	Watch	N/A — no expected response	Receiving content, not producing it
Lumikids beta	Speak, hear voice response	Under 1 second to first word	Still in the loop

Two things to notice. First, response time is not a single number; it is the gap between the child's effort and the app's reaction to that effort. Tap-based apps can be quick at the tap layer and still slow at the feedback layer, because audio has to load, animations have to play out, and progress has to sync. Second, the difference between a 2-second and a 10-second response is not 5x worse. It is categorically different. One stays inside the working-memory window. The other does not.

The point is not to put Lexia or IXL on trial. Both apps have good content and serious curriculum teams behind them. The point is that they were architected in an era when web app latency of 5–10 seconds was acceptable, and a four-year-old's brain was not on the requirements document.

Why voice changes the math

A tap is a small action. A spoken word is a sustained one. When a child says "fffrog" out loud, they have committed muscles, breath, and attention to producing that sound. The window for receiving meaningful feedback on it is narrower than the window for receiving feedback on a tap, because the child's whole body was just engaged. The brain expects an answer in the rhythm of conversation, not in the rhythm of a web request.

Adult conversation has a remarkably consistent turn-taking gap of around 200 milliseconds across languages, according to Stivers and colleagues' cross-cultural study (Stivers et al., 2009, Proceedings of the National Academy of Sciences). Children's tolerances are looser, but the floor is set by speech, not by software. Any tutor that takes 8 seconds to reply to a spoken word is asking a child to behave like an email client.

How Lumikids gets under one second

Lumikids is built on three pieces that, together, keep the feedback loop inside the working-memory window.

Anthropic's Claude handles the reasoning — figuring out whether "ffffrog" should be praised, gently corrected, or followed up with a question about whether frogs have teeth. We use streaming so the first token of the response leaves the model before the full answer is composed. ElevenLabs converts that streamed text into speech with sub-second time-to-first-audio. Wispr Flow handles the speech input side, transcribing a child's messy real speech (including the half-words, the giggles, and the "ummm wait") faster than most adult-tuned systems handle clean dictation.

The result, in practice, is that Remi finishes saying "frog," and before he has fully exhaled, the tutor has said "yes." Then it asks him whether frogs hop or fly. Then he is back in the loop, instead of upside down on the couch.

This is not a flex. It is the only architecture that respects what a four-year-old's brain is actually doing. Slower architectures are not bad in spirit. They are bad in physics.

What this means for picking a learning app

If your child is older than seven, latency matters less. Working memory has grown, rehearsal strategies have developed, and a 4-second wait is annoying but recoverable. If your child is four, five, or six, latency is the variable. Ask any app you are considering: how long does it take to respond to my child after they do the thing you asked them to do? If the honest answer is more than three seconds, you are not buying a tutor. You are buying a quiz with a timeout.

A few practical tests:

Sit with your child during a session and count, out loud, "one Mississippi, two Mississippi" between their action and the app's response.
Watch their eyes during the gap. If they leave the screen, the gap is too long.
Notice how often you have to redirect them back to the task. Frequent redirects are not your child being "distractible." They are the app exceeding the working-memory window.

The bar is not "no waiting." The bar is "waiting that fits inside the cognitive loop the child is already in."

The honest takeaway

Reading apps did not get slow on purpose. They got slow because the tech stack of 2012 made sub-second AI responses impossible, and the curriculum teams designed around that constraint. The constraint is gone now. The architectures have not caught up. That is the gap Lumikids is trying to close — not because faster is impressive, but because faster is the only speed at which a four-year-old can actually learn.

Try the beta at lumikids.dev and watch your child stay in the loop.

Image brief

Hero image: A four-year-old at a tablet, eyes drifting away from the screen while a spinning loading icon hovers in the corner, warm afternoon light.
Inline image 1: A simple infographic showing a 15-second working-memory window as a horizontal bar, with response-time markers from common apps falling inside or outside the bar. Place after the "What working memory looks like at four" section.
Inline image 2: A side-by-side timeline showing two children — one waiting through a 12-second spinner and losing focus, one getting a sub-second voice reply and continuing the conversation. Place after the response-time table.

Internal link suggestions

"How my four-year-old taught me to build an AI tutor" — anchor text: "the founding story behind Lumikids"
"What developmental science says about attention in early learners" — anchor text: "attention spans by age"
"Adaptive learning isn't a setting" — anchor text: "how live reasoning differs from a fixed difficulty tree"

Editor's note

Tim — three things to verify before publish. (1) The response-time table reflects my own home-network stopwatch testing on the date in the file, not vendor-published numbers; we should add a short methodology footnote or rerun on a documented date. (2) The Cowan 2010 and Gathercole and Alloway 2008 citations are paraphrased from memory; please confirm exact wording or replace with direct quote pulls before publish. (3) The "fourteen seconds, frog, upside down on the couch" anecdote is the version I remember — confirm with Kate it was 14 and not 12, and that it was "frog" not "fish."

One more thing —

Lumi is in open beta and free for the first 100 families. If reading time at your house ever feels harder than it should, we built this for you.

Try Lumi free →