A parent's framework for evaluating any AI tutor

A friend asked me last week which AI tutor she should put on her seven-year-old's tablet. She'd been to three websites and read four "best of" lists and was more confused than when she started. Every product claimed it was personalized. Every product claimed it was safe. Every product had a smiling kid on the homepage and a quote from a teacher.

That's not enough information to make a decision about your child.

So I wrote her the list of questions I would ask if I weren't the person building one of these things. Then I scored Lumikids against the same list, honestly, including where we fall short. The framework below is the result. Use it on us. Use it on the competition. If a company can't answer all five, that itself is your answer.

Why a framework, and why these five questions

The Artificial Intelligence (AI) tutor market is having its 1999 moment. There are dozens of products, the marketing copy is interchangeable, and the actual engineering underneath ranges from genuinely thoughtful to a thin wrapper around a general-purpose chatbot pointed at a children's-book illustration.

The five questions I'll walk through are the ones that, in my experience building Lumikids, actually separate the two. They are: what model is it running, what data does it collect, how does it adapt, what's the latency, and how do you see what your child did. Each one is something a serious company can answer in a paragraph. Each one is something a thin wrapper can't answer at all.

If you want the deeper version of the safety conversation, I wrote one already in what 'safe AI for kids' actually means. This piece is the shorter, comparative version — designed so you can sit on the couch with a laptop and rank three or four products in under an hour.

Question 1: What model is it running, and who trained it?

Why it matters. "AI" is now a label, not a description. A product running on a small open model fine-tuned by a five-person startup behaves very differently from one running on a frontier model with a published safety methodology. Children's responses to a tutor are unpredictable; the underlying model has to be good enough to handle a four-year-old who suddenly wants to talk about volcanoes for nine minutes.

What you want to hear: a specific model family, a specific provider, and a willingness to discuss why they chose it. What should worry you: vague answers, "proprietary AI," or no answer at all.

Lumikids score: strong, with one caveat. Lumikids is built on Anthropic's Claude — specifically Claude Opus 4.7 for the reasoning layer that talks to your child. Anthropic publishes its safety research, its model cards, and its Responsible Scaling Policy openly. The caveat: like any frontier model, Claude can still produce occasional unexpected outputs, which is why we layer content filtering and a parent-visible session log on top. We don't claim the model alone makes us safe. The system around it does.

Question 2: What data do you collect on my child, and where does it live?

Why it matters. Children's data is regulated by the Children's Online Privacy Protection Act (COPPA) in the United States and by the General Data Protection Regulation (GDPR) in Europe, but compliance is a floor, not a ceiling. The real question isn't "is it legal" — it's "what would I be comfortable with someone knowing about my kid in ten years." The Federal Trade Commission (FTC) has made the COPPA rule and its enforcement actions publicly searchable; that's a useful baseline to compare any company's stated policies against.

What you want to hear: a specific list of what's collected, who hosts it, how long it's retained, whether it's used for model training, and whether it's ever shared with third parties. What should worry you: a privacy policy that uses the phrase "industry-standard practices" and stops there.

Lumikids score: strong on data handling, partial on retention transparency. We store session transcripts, audio of the child's responses, and learning-progress signals. Audio and transcripts live in encrypted storage on infrastructure inside the United States. We do not sell data, do not use it to train third-party models, and process it under a COPPA-compliant flow with verifiable parental consent. Where we're still improving: we haven't yet shipped a self-serve "delete everything" button. Today it's a support request, which works but isn't the standard I want. That ships next quarter.

Question 3: How does it adapt — fixed tree, or live reasoning?

Why it matters. Almost every product calls itself "adaptive." Most of them mean something narrow: if your kid gets three questions right, the difficulty goes up one notch on a pre-built decision tree. That's better than nothing, but it's not what most parents picture when they hear the word "adaptive." A child who confuses "b" and "d" because of a visual processing pattern needs a different response than a child who confuses them because they rushed. A fixed tree can't tell the difference. A reasoning model can.

What you want to hear: a description of how the product responds to what the child actually said or did, not just whether the answer was right or wrong. What should worry you: a product demo that looks the same on the third try as it did on the first.

Lumikids score: strong by design, still maturing in practice. Our adaptation runs through Claude reading the full context of the moment — the child's words, pauses, prior attempts, and emotional cues from voice — and writing the next response live. That's the architecture. The honest caveat is that "live reasoning" is only as good as the prompts and context windows we feed the model, and we are still tuning both. I wrote about the architecture in more detail in adaptive learning isn't a setting. The short version: we got the foundation right, and we're refining the execution every week.

Question 4: What's the latency, end-to-end, in real conditions?

Why it matters. Working memory in young children is short. If a tutor takes ten seconds to respond, the child has already forgotten what they said, what they were trying to do, and sometimes that they were doing anything at all. This is the single most underweighted variable in the entire kids' edtech market — most reviews don't even measure it. The National Institute of Standards and Technology's AI Risk Management Framework explicitly flags responsiveness and usability as components of trustworthy AI, not afterthoughts.

What you want to hear: a number, in seconds, measured under real home Wi-Fi conditions, not in a demo. What should worry you: any answer that includes the phrase "near-instant."

Lumikids score: strong, and the reason the company exists. Our median end-to-end response time — from the child finishing a sentence to Lumi starting to speak back — is under one second on a normal home connection. We use ElevenLabs for sub-second voice synthesis specifically because every other piece of the stack can be fast, and a slow voice layer wastes all of it. When latency creeps up, our monitoring catches it and we treat it as a bug, not a vibe. The full argument for why this matters is in why a ten-second delay kills your child's learning.

Question 5: How do I see what my child actually did?

Why it matters. A weekly email that says "Maya practiced reading for 45 minutes and improved!" is not observability. It's a marketing summary. You need to know what your child was asked, what they said, where they got stuck, and what the tutor said back — at the level of detail you'd expect from a human teacher's notes. Without that, you can't tell whether the product is working or just running. The American Academy of Pediatrics' media use guidance emphasizes parents staying engaged with what their child is consuming; you can't do that on summaries alone.

What you want to hear: a parent view that shows session-by-session activity, sample exchanges, and what the child struggled with. What should worry you: a dashboard that only shows badges, streaks, and time-on-task.

Lumikids score: strong on depth, still building on usability. The parent dashboard shows every session, where Lumi paused or re-explained, audio playback of key moments, and weekly skill snapshots. I walked through it in detail in the parent dashboard. What we don't do well yet: the mobile layout is rough, and we don't have a way to compare a child's progress to age-band benchmarks. Both are on the roadmap. Neither is hidden.

How to use this with any product, including ours

Take this list to whichever AI tutor you're considering — Lumikids, a competitor, the one your kid's friend uses, the one your school district is piloting. Email the company. Ask the five questions in plain language. Pay attention not just to the answers, but to how fast they come back, how specific they are, and whether the person answering can do it without checking with legal.

A serious product will answer all five in a single reply. A thin wrapper will dodge at least two. That distinction tells you almost everything you need to know before your child opens the app.

If you want to score Lumikids yourself, the beta is open and the dashboard is live. Try the beta at lumikids.dev.

Image brief

Hero image: A parent at a kitchen table with a tablet, a notebook, and a coffee mug, comparing two children's app interfaces side by side, late afternoon light, the parent's expression focused and skeptical.
Inline image 1: A clean five-row scorecard graphic with the five question labels on the left and an empty column for the reader to fill in, placed after the "Why a framework" section.
Inline image 2: A simple comparison diagram showing "fixed decision tree" on one side (boxes and arrows on a grid) versus "live reasoning" on the other (a flowing conversation bubble), placed inside the Question 3 section.

Internal link suggestions

"What 'safe AI for kids' actually means" — anchor text: "what 'safe AI for kids' actually means"
"The parent dashboard: what we show you and why" — anchor text: "the parent dashboard"
"Adaptive learning isn't a setting — it's the whole product" — anchor text: "adaptive learning isn't a setting"

Editor's note

Two things for Tim to confirm before publishing: (1) the under-one-second median latency claim — please verify against the latest production numbers, since this gets quoted back to us; and (2) the "self-serve delete button ships next quarter" commitment — only keep this if it's actually on the roadmap with an owner, otherwise soften to "is on our roadmap." Also worth a sanity check that we're comfortable naming Claude Opus 4.7 specifically in a parent-facing post, vs. a more general "Anthropic's Claude" phrasing.

One more thing —

Lumi is in open beta and free for the first 100 families. If reading time at your house ever feels harder than it should, we built this for you.

Try Lumi free →