Fifteen minutes with Lumi: a kindergartener's session, narrated

It's 7:32 on a Tuesday morning. Remi is at the kitchen table in dinosaur pajamas, juice on his left, a propped-up tablet at eye level, half a piece of toast he has lost interest in on a plate to his right. Kate is at the counter making coffee. I'm at the sink, pretending to do dishes. This is the part of the day Remi gets to choose: fifteen minutes with Lumi before school, or fifteen extra minutes of cartoons. He picks Lumi about four days out of five, which still surprises me.

What follows is what those fifteen minutes actually look like. Not the demo version. Not the highlight reel. The real thing, including the part where he gets one wrong and the part where he laughs so hard he chokes on juice.

Minute 1 to 3: warm-up

The session opens with a soft chime — not a fanfare, not a startup sound, just one warm tone — and the firefly character lights up on the screen. There's no loading bar. There's no "Hi friend!" pop-up. Lumi just speaks, in a voice that ElevenLabs synthesizes to sound like a patient adult who has all morning.

"Hey Remi. Want to start with sounds or with words today?"

Remi says "sounds." The microphone, running through Wispr Flow, catches it cleanly even though his mouth is half-full of toast. Lumi doesn't make him repeat himself. The screen shifts to a TapTheSound exercise — three icons appear: a cat, a cup, a dog.

"Which one starts with /k/?"

Remi taps the cat. Lumi says, "Yes. Cat starts with /k/." Then a beat. "What about cup?" He taps the cup. "Right, cup starts with /k/ too. Two /k/ sounds. Nice ear."

Now a harder one. Three new icons: a fish, a thumb, a feather.

"Which one starts with /f/?"

Remi taps the thumb. The thumb does start with the letters t-h, but the sound is /θ/, not /f/. He's confused the visual with the sound. This is exactly the kind of mistake five-year-olds make, and it's the kind that legacy apps tend to punish with a buzzer noise and a "Try again!" pop-up.

Lumi does something different. It doesn't say wrong. It says, "Thumb is tricky. Let me show you." The firefly's mouth opens on screen, slow, exaggerated. "Watch my mouth. Thuh-uh-mb. My tongue is between my teeth. Now /f/." The mouth-shape changes. "Fff. My teeth touch my lip. Try /f/ with me."

Remi says "fff." Lumi waits half a second, then says, "Good. Now which one starts with /f/?" Remi taps the fish.

That whole correction took maybe twelve seconds. No shame, no buzzer, no losing a heart, no being kicked back to the menu. Just a mouth-shape cue, a re-teach, a re-ask.

The science here matters. The University of Florida Literacy Institute's UFLI Foundations program teaches early phonics this way on purpose: when a child errs on a sound, you don't drill them — you show them the articulation. We borrowed the pedagogy. Claude does the moment-by-moment judgment about when to deploy it.

Minute 4 to 7: decoding

The session moves into BuildTheWord. Three letter tiles appear on screen: c, a, p. Lumi says, "Let's build this one together. Touch the first letter."

Remi touches c. "C says /k/," Lumi says. Remi taps a. "A says /a/." He taps p. "P says /p/."

Then comes the part most apps blow. Lumi says, "Now blend them." And then — and this is the design choice I'm most proud of — Lumi shuts up.

Remi sits there. He's looking at the letters. His mouth is moving slightly. He's whisper-rehearsing. Seven seconds pass. Eight. Nine. An older platform would have jumped in by now with a hint, or worse, said the answer for him. Lumi waits. There's no timer counting down on screen. The firefly is just hovering, watching.

At eleven seconds, Remi says, "Kuh-aaa-puh." Then, more confidently, "Cap."

Lumi says, "Yes. Cap. Like the hat." The screen shows a small cartoon baseball cap appearing on the firefly's head. Remi giggles.

The silence was the lesson. There is a substantial body of teaching research on what's called wait time — the gap between asking a question and prompting again — that goes back to Mary Budd Rowe's work in the 1970s and has been replicated since. Three seconds of silence dramatically improves the quality of student responses. Lumi's default wait time for blending tasks is up to fifteen seconds before any nudge, and it scales based on what it has learned about each kid. Remi's average decode time on a new word is around nine seconds. So Lumi waited eleven and didn't flinch.

We wrote more about why this matters in Why a ten-second delay kills your child's learning — the headline is about app latency, but the deeper point is the same: time, well spent or wasted, is the whole game in early learning.

Minute 8 to 10: the firefly tells a bad joke

Eight minutes in, after Remi has correctly decoded three more CVC (consonant-vowel-consonant) words — map, tap, nap — the firefly does something unexpected. It floats up to the top of the screen, looks down at the letter tiles, and says, in a stage whisper: "Hey Remi. Why did the cat sit on the cap?"

Remi looks up. "Why?"

"Because it wanted to be a... cat-a-cap-illar."

There is a pause. Remi makes a face like he is genuinely offended by this pun. Then he laughs — a real, surprised, juice-snorting laugh — and says "that's so bad."

Lumi says, "I know. Want another word?" And the session moves on.

The joke is dumb on purpose. It is calibrated to be exactly stupid enough that a five-year-old feels morally superior. That's the trick. Kids this age are starting to develop a sense of humor, and a tutor that takes itself too seriously loses them by minute six. The firefly's job, between exercises, is to be slightly silly so that the exercises themselves can be serious.

It also serves a structural purpose. We use these micro-breaks to reset attention. Sustained attention in five-year-olds is roughly three to five minutes on a focused task before a dip. A six-second joke at the eight-minute mark buys the next three minutes of decoding work. Without it, the second half of the session falls off a cliff. With it, Remi's accuracy in minutes nine through thirteen is statistically identical to minutes one through five.

Minute 11 to 13: heart words

Now comes the part of the session that gives most kindergarteners trouble: irregular words. The screen shows the word said. Lumi says, "Some words don't play by the rules. Said is one of them. The s and the d are doing what you'd expect — /s/ and /d/. But the middle part, a-i, doesn't say /ai/. It says /eh/. We call that the heart part."

A small cartoon heart appears around the letters ai.

"You know it with your heart, not your mouth. Say it with me. Said."

Remi says "said."

"What's the heart part?"

He taps the a-i. The heart glows.

This is heart-word instruction — a small but important shift from how a previous generation of apps taught so-called sight words. The idea isn't to memorize the whole word as a blob. It's to phonetically decode the regular parts and memorize only the irregular slice. We explain the pedagogy in more depth in Sight words vs heart words: what your kid actually needs to memorize, but you can see it in action in three sentences: phonics for the consonants, hearts for the rule-breaker, never "just remember it."

The International Dyslexia Association calls this kind of teaching structured literacy. It works for most kids and it's load-bearing for kids with dyslexia.

Minute 14 to 15: the wrap

The session ends not with a fanfare but with a quiet recap. The firefly settles in the middle of the screen and the banner reads: Today: 14 minutes, 8 of 9 sounds, 6 new words, 1 heart word.

Lumi says, "You sounded out three new words today — cap, tap, and nap. That's a lot of /a/ sounds. Tomorrow we'll try /i/. Have a good day at school, Remi."

That's the whole praise line. No "AMAZING JOB!!" No confetti. No leaderboard. One specific sentence about what he actually did, and one preview of what's next.

Remi closes the tablet, drinks the rest of his juice, and goes to find his shoes. The whole thing took fifteen minutes and one second.

What Kate sees that evening

When Kate opens the parent dashboard after dinner, she doesn't see a score. She sees the morning, reconstructed. There's a timeline of every exercise, color-coded by skill area. She can see that Remi got the thumb-versus-fish wrong on the first try, and there's a small audio icon — she can play back the moment, in his voice, where he says "thuh-uh-mb" and then "fff" and then "fish." She listens to it twice because it makes her laugh.

She can see that he hesitated for eleven seconds on cap and then nailed the blend. There's a note from Lumi: Long wait, clean blend. Good sign for /a/ family. She can see the joke break logged as a check-in moment, not as wasted time. She can see said added to his heart-word inventory.

What she can't see — what we deliberately don't show — is a daily grade, a comparison to other kids his age, or a streak counter trying to guilt her into making him do this tomorrow. We've written about the reasoning in The parent dashboard: what we show you and why. The short version: parents need observability, not summaries. A grade is a summary. The audio of your kid saying "fff" for the first time is the real thing.

Why this matters more than features

I could have written this article as a feature list — sub-second voice, adaptive curriculum, structured literacy alignment, configurable wait times. All true, all boring.

What I want you to know is what fifteen minutes feels like. Because the difference between a tutor your kid quits after a week and one your kid asks for the next morning isn't in the spec sheet. It's in whether the silence at minute six gets respected, whether the joke at minute eight lands, whether the praise at minute fifteen names something specific your child did.

Lumi is in beta and it isn't perfect. Some mornings I wish the firefly had cued the mouth shape one beat earlier. Some jokes don't land. But the shape of the session — patience, specificity, the refusal to fill silence — is what I built it for.

If you want to try it with your kindergartener, join the beta.

Image brief

Hero image: A five-year-old in pajamas at a sunlit kitchen table, juice glass beside a propped-up tablet showing a small glowing firefly character, half-eaten toast on a plate, soft morning light, real-life kitchen, not a stock photo.
Inline image 1 (after Minute 1 to 3 section): A simple diagram of the TapTheSound exercise — three icon tiles labeled fish, thumb, feather — with a small annotated callout showing Lumi's mouth-shape cue for /f/ vs /θ/.
Inline image 2 (after Minute 14 to 15 section): A clean wireframe of the end-of-session banner — "Today: 14 minutes, 8 of 9 sounds, 6 new words, 1 heart word" — with the firefly centered and no confetti, no stars, no streak counter.

Internal link suggestions

"Voice-first learning: why we built around speech, not taps" — anchor: the difference between a voice tutor and a tap-based app
"The parent dashboard: what we show you and why" — anchor: The parent dashboard: what we show you and why
"Sight words vs heart words: what your kid actually needs to memorize" — anchor: Sight words vs heart words: what your kid actually needs to memorize

Editor's note

Tim — please confirm three specifics: (1) Remi's average decode time on new CVC words is around nine seconds — that's my memory from the dashboard, double-check the actual median. (2) The cat-a-cap-illar joke is a stand-in; if there's a real joke from a recent session you'd rather use, swap it in. (3) The "8 of 9 sounds, 6 new words, 1 heart word" stats in the end-of-session banner are illustrative — verify those numbers match what a typical 14-minute session actually shows before publish. Also worth a sanity check that the UFLI Foundations toolbox URL still resolves; if it's moved, point to the program landing page instead. [VERIFY] the Mary Budd Rowe wait-time reference — the underlying research is solid but I'm citing it from memory.

One more thing —

Lumi is in open beta and free for the first 100 families. If reading time at your house ever feels harder than it should, we built this for you.

Try Lumi free →