Remi is four. He sounds out "frog" and then waits. The tablet thinks. He looks at me. He looks back at the screen. He says "frog" again, louder, the way you'd say it to an older relative who hadn't heard. He waits. Somewhere around second eleven, his shoulders drop. By second fourteen, he's picked up a toy dinosaur and walked away. The app finally chimes its little success sound to an empty rug.
That was the moment I stopped being a parent who pays for learning apps and became a parent who builds one.
The ten-second cliff
I'm not anti-app. My wife Kate and I both work, Remi is curious about everything, and we wanted something he could do on his own that wasn't a cartoon. We tried Lexia. We tried a couple of the big ones whose names you already know. They all share the same architecture: tap a thing, wait, get feedback, tap the next thing.
The waiting was the problem.
I started timing it. From the moment Remi finished speaking or tapping to the moment the app responded was somewhere between ten and twenty seconds, depending on the screen and the day. That number doesn't sound dramatic when you read it. Sit next to a four-year-old and count it out loud. Ten Mississippi. Twenty Mississippi. A small child's attention is not a fixed quantity you spend down; it's a fire you have to keep feeding. Every second of dead air is a draft on the flame.
I went looking for research to confirm what I was watching, and the developmental psychology lined up with the rug. Working memory in a four- to seven-year-old is short and fragile. By the time the app got around to confirming "frog," Remi had often forgotten he said "frog" in the first place. The praise wasn't praise anymore. It was a notification for someone who'd left the room. I get into the cognitive science in more depth in why a ten-second delay kills your child's learning, but the short version is this: every second of latency is a tax on a brain that doesn't have a lot of working memory to spend.
The rule we'd already made
Kate and I have one rule about Remi's learning that we've held to since he was old enough to point. Follow his question. When he asks why the moon is sometimes there in the daytime, you stop loading the dishwasher and you go look at the moon with him. When he asks how a zipper works, you take a zipper apart together. The point isn't that we always have the right answer. The point is that the question gets met while it's still alive.
Watching him with these apps, I realized we'd been outsourcing the most important part of learning — the responsiveness — to software that couldn't do it. The curriculum was fine. The art was fine. The voices were fine. But the gap between Remi saying a word and the app reacting was longer than his interest in the word. We weren't teaching him to read. We were teaching him to wait.
I'm not a former teacher. I'm a builder. So I did what builders do when they get angry about something: I opened a terminal.
The stack, and why it matters
I knew from the first prototype I wasn't going to write a clever frontend on top of the same slow backend. The whole experience had to be fast end-to-end, and it had to feel like talking to a patient adult, not pressing buttons on a kiosk. Three pieces did the heavy lifting.
Reasoning: Claude
For the brain of the tutor, I built on Anthropic's Claude. I'd been working with Claude in my day job at Digital Boutique AI for a year, and two things made it the obvious choice for a kids' product. First, the safety training is unusually serious — Claude is harder to jailbreak into saying things you don't want it saying to a small child than most alternatives I tested. Second, it can hold a conversation about a four-year-old's halting attempt at "frog" without flattening it into a quiz. It reasons. It doesn't just classify.
Voice out: ElevenLabs
For the voice your child hears, I use ElevenLabs. The synthesis is fast and warm enough that Remi started talking back to it without prompting. There's a real difference between a voice that sounds like a robot reading a script and a voice that sounds like an adult who's actually listening. Remi can hear the difference. So can I. So can you.
Voice in: Wispr Flow
For listening to a four-year-old, I use Wispr Flow. Kids do not speak in clean, well-formed sentences. They start words and abandon them. They sing in the middle of answering. They mumble into a sleeve. Wispr handles that messiness better than the speech recognition built into most apps, which were tuned on adult voices reading audiobook scripts. The whole loop — Remi speaks, the system understands, Claude reasons, ElevenLabs answers — runs in roughly a second on a good connection. Not ten. Not fourteen. One.
I bolt Sentry to the whole thing for error monitoring and PostHog for learning analytics, because if a four-year-old hits a bug, I want to know within minutes, not when the parent finally emails support.
The first beta, with the first kid
The first version of Lumikids was a tab in my browser on a Tuesday night. Remi sat next to me on the couch in pajamas. I said, "Bud, this one's new. Tell it what you want to read about."
He said "dinosaurs."
The voice came back, almost instantly, and asked him which one. He said "the one with the big neck." Brachiosaurus. They had a conversation. A real one. He sounded out words. When he got stuck, the tutor waited the right amount of time — not the timer-driven, fake-patient pause built into legacy apps, but an actual conversational beat — and then offered a hint shaped to what he'd just tried. When he got it right, the praise came while he was still proud of himself, not fourteen seconds later.
About twenty minutes in, Remi turned to me, completely matter of fact, and said, "Daddy, can it also tell me about volcanoes?"
That was the moment I knew I had something. Not because he liked it. Lots of kids like apps. Because the question came faster than the answer to the last one. The fire was burning hot enough that he was already feeding it the next log.
What I had to unlearn
I want to be honest about the things I got wrong, because most founder stories conveniently leave these out.
I overbuilt the curriculum first. I had a whole adaptive reading-level system mapped out before Remi had used the thing for an hour. He didn't need it. What he needed was a tutor that could actually respond, in any direction, to whatever just came out of his mouth. I get into how that kind of live, conversational adaptation works — and why it's different from the "adaptive" claims most apps make — in a later piece, but the short version is: the tree of pre-written branches is not the product. The conversation is.
I also assumed parents would want a dense dashboard packed with metrics. They don't. They want to know two things: is my kid learning, and is my kid safe. Everything else is a distraction. We're still tuning that view, and I'd rather show less and have it mean more than show more and have it mean nothing.
What's next
The reading beta is live for a small group of families. We're adding kids slowly, on purpose. I am not in a hurry to be big; I am in a hurry to be right. After reading we'll add math — not flashcards, conversational problem-solving — then science questions a kid actually asks, then a second language using the voice stack we've already built. There's a piece coming on the full Lumikids roadmap from reading to math, science, and languages that lays out the order and the honest timeline. I'd rather underpromise that than ship a fake demo.
I also know there are things AI tutors should not try to do for children, and I'd rather draw those lines in public than discover them in the App Store reviews. We don't gamify the way the rest of the industry does. There are no streaks. There are no notifications begging your kid to come back. Sessions end when the learning is done, not when the engagement metric is hit. If you'd like the longer argument for why, I'll be making it in upcoming pieces on safety and on screen quality.
For now, the thing exists. It's faster than what we tried before. It listens. It waits the right amount of time, and only the right amount of time. It was built by a dad on the couch next to a four-year-old in dinosaur pajamas, and that is the only test it had to pass.
This first beta is for Remi. The next one is for your kid.
Image brief
- Hero image: A four-year-old boy with curly hair sitting cross-legged on a living room rug, leaning toward a tablet in soft morning light, his face caught between hopeful and about-to-give-up.
- Inline image 1: A simple horizontal bar chart showing response times — "Legacy apps: 10–20 seconds" vs. "Lumikids: ~1 second" — on a clean off-white background. Placement: after the "ten-second cliff" section.
- Inline image 2: A loose hand-drawn diagram of the voice loop — child icon → Wispr Flow → Claude → ElevenLabs → child icon — drawn on graph paper. Placement: inside "The stack, and why it matters," after the three sub-sections.
Internal link suggestions
- "why a ten-second delay kills your child's learning" → Why a ten-second delay kills your child's learning
- "the full Lumikids roadmap from reading to math, science, and languages" → From reading to math, science, and languages: the Lumikids roadmap
- "voice-first learning" (optional, can be added in a sentence about Wispr/ElevenLabs) → Voice-first learning: why we built around speech, not taps
Editor's note
A couple of things to verify before publishing. The "ten to twenty second" range for legacy app response times is my lived observation with Remi on Lexia and similar apps, not a published number — Tim, confirm you're comfortable framing it as personal observation rather than industry data. [VERIFY] The Remi "Brachiosaurus / volcanoes" anecdote is composited from a couple of real sessions; tighten or change the specific animals if you want it to match a single session exactly. The "~1 second" end-to-end latency claim for the current Lumikids stack should be confirmed against your most recent production traces before we put it in print.
Lumi is in open beta and free for the first 100 families. If reading time at your house ever feels harder than it should, we built this for you.