← All articles
AI Safety

What we don't track, and why

The actual data model behind Lumikids, written by the engineer who built it.

Tim de Vallée9 min2026-12-30

At a parent meetup in Hudson last month, a dad cornered me by the coffee table and asked the only question that matters: "What are you doing with my kid's voice?" Not a hostile question. He had read enough about voice cloning and ad targeting to know that "we take privacy seriously" is the corporate version of "trust me." He wanted the actual answer. So I drew the data model on a napkin.

This article is that napkin, expanded. I am going to walk through the three things Lumikids does collect, the five things it refuses to collect, what happens to a child's audio after the speech-to-text step, and how a parent can wipe every byte of their kid's data in about ninety seconds. If you have read the parent dashboard piece, this is the back of that same coin: what is in the database, and what is not.

The three things we do collect

Every session your child has with Lumi produces records in exactly three buckets. That is the whole data model. There is not a fourth one hidden in an analytics service.

Correctness signals

When Remi reads "cat" and the speech recognizer transcribes "cat," the system stores a tiny row that says: prompt was "cat," response was "cat," result was correct. When he says "kuh-aaa-tuh" and gives up, the row says: prompt was "cat," response was "kuh-aaa-tuh," result was attempted-not-resolved. These rows are what the parent dashboard uses to tell you that your kid is consistent on short-a words and still working on blends.

That is it. No phoneme-level acoustic features. No "voice maturity score." No proprietary metric that pretends to measure how confident your child sounded. Just: what was asked, what was said, did it match.

Timing

We store two timestamps per turn: when Lumi finished speaking, and when the child's response was transcribed. The difference is the response latency. That number drives the pacing logic — if your kid is consistently answering in two seconds, the level is too easy; if they are taking twelve, we slow Lumi down and offer a hint. It is also how we show you, the parent, where your child got stuck without making you read every transcript.

We do not store mouse movements, keystroke dynamics, microphone background levels, or any of the other behavioral signals that get bundled into "engagement analytics" in adult products.

Session structure

The third bucket is the skeleton of the session: which level was attempted, which words were in the deck, how many turns happened, when the session ended and why (kid said "I'm done," parent closed the tab, timer hit fifteen minutes). This is the data that makes the weekly snapshot possible.

Three buckets. Correctness, timing, structure. If you ask me what Lumikids "knows" about your child, that is the answer.

The five things we don't collect

This is the half of the conversation that matters more, because absence is harder to verify than presence. Here is what is not in the database, and what would have to be true for any of it to end up there.

Voice prints and biometric identifiers

Lumi does not generate a voiceprint. The audio goes to the speech-to-text service, comes back as text, and the audio is dropped. There is no acoustic embedding stored against your child's account. If someone subpoenaed our database tomorrow looking for a way to identify your kid by voice, there would be nothing to hand over.

Advertising cookies and third-party trackers

There are no ad-targeting pixels on the Lumikids site. No Meta pixel, no TikTok pixel, no Google Ads remarketing tag. The only third-party scripts that run during a session are the ones that make the session work: the speech-to-text endpoint, the voice synthesis endpoint, and our own error monitor. The kids' product is not on the open advertising web, and we are never going to put it there.

Precise device location

We see the country a request came from, because every Hypertext Transfer Protocol (HTTP) request includes an Internet Protocol (IP) address and the IP roughly geolocates to a country. We use that to know whether to serve the European Union (EU) data-handling page or the United States one. We do not request Global Positioning System (GPS) permissions. We do not store street-level geolocation. We do not know which town you live in, and we have no business knowing.

The parent's browsing data

When you log into the parent dashboard, we set a session cookie so you stay logged in. We do not set any cookie that follows you to other sites. We do not buy data about you from a broker. The parent account exists to authenticate you to your child's data. It is not a marketing surface.

Third-party social identifiers

You sign in with email or with Google Sign-In, and that is the entire identity stack. We do not link your account to Facebook, X, TikTok, or any social graph. We do not enrich your profile with data from a customer data platform. If you delete your account, there is no shadow profile sitting in a vendor's system waiting to be matched the next time you sign up.

The voice question, answered specifically

Back to the dad at the meetup. The audio question deserves its own paragraph because it is the one parents care about most, and it is the one with the most confusing answers in the industry.

When your child talks to Lumi, the audio is captured by the browser and sent over an encrypted connection to a speech-to-text provider. We currently use ElevenLabs for synthesis and a separate speech recognition service for transcription; both have data processing agreements that prohibit using children's audio for model training. After the audio is transcribed into text, it is deleted. We do not keep a copy on our side. We do not fine-tune any model on your kid's voice. The transcript — that is, the text — is what flows into the three buckets above.

There is one exception, and you should know about it. If you, the parent, explicitly turn on "save audio for review" in the dashboard so you can play back a session, the audio is retained for that session only, encrypted at rest, and auto-deletes after thirty days. The default is off. We chose off because most parents do not need to hear the audio; they need to see the transcript and the timeline.

Why we don't collect the rest

The honest answer is not "because the Children's Online Privacy Protection Act (COPPA) requires it." COPPA, enforced by the Federal Trade Commission, sets a floor — verifiable parental consent, limits on what you can collect from kids under thirteen, deletion on request. We comply with all of it. But COPPA does not stop a company from collecting voice prints with consent, or from running ad pixels on the parent's side, or from quietly enriching the parent profile with broker data. Plenty of "kid-safe" apps do exactly that and stay legal.

We do not collect those things because we do not want the temptation. If the data is not in the database, no future product manager — including a future me on a bad day — can decide to mine it. There is no growth experiment that starts with "what if we used the voice prints for…" because there are no voice prints. The architecture is the policy. That is the only kind of privacy promise I believe in anymore.

The reasoning is similar for our model choice. Lumikids is built on Anthropic's Claude, and Anthropic's commercial terms commit to not training their models on customer Application Programming Interface (API) inputs by default. That is one of the reasons Claude is the model under the hood and not a model that treats every conversation as training fuel.

Retention, export, and the delete button

Session data lives for eighteen months from the date of the session, then it is either anonymized — stripped of any link to your child's account — or fully deleted, depending on a flag you set in the dashboard. Eighteen months is long enough to show year-over-year progress and short enough that we are not sitting on a multi-year dossier of your kid's reading.

Two buttons in the parent dashboard matter here. Export dumps every record we have on your child as a single Joint Photographic Experts Group (JPEG)-free, JavaScript Object Notation (JSON) file — every session, every turn, every correctness flag, every timing record. You can read it in any text editor. Delete removes the account, the child profile, every session, and every transcript. It is not a "request" that goes to a support queue. It runs against the database the moment you confirm.

Here is the ninety-second walkthrough I gave the dad in Hudson:

  1. Open the dashboard, click your child's name.
  2. Click Settings in the top right.
  3. Scroll to Data.
  4. Click Export if you want a copy first. Wait about ten seconds for the file.
  5. Click Delete child profile. Type the child's name to confirm.
  6. The next screen says "Deleted." That is the entire experience.

The deletion is irreversible, which is the point. If we kept a recoverable copy "just in case," that would be another database to defend, and the whole argument falls apart.

You can read the longer version of all of this on our privacy page. The short version is on the napkin.

The honest part

There are two things in this article that I want to flag as still in motion.

First, the eighteen-month retention is a deliberate choice, not a law. Some parents have asked for shorter — say, six months. We are considering a parent-set retention slider, anywhere from three to twenty-four months, with the default still at eighteen. If you have a strong opinion, the beta feedback form is the place to tell us.

Second, we are a small team. The architecture above is real and shipping. The audit of it is not yet independent. I would rather tell you that than pretend we have a Service Organization Control (SOC) 2 report we do not have. We will have one. We do not have one today.

If the napkin makes sense, try the beta.

Image brief

  • Hero image: A clean engineer's notebook on a wooden desk with three short lists labeled "collect," "never," and "delete," next to a child's crayon drawing of a sun.
  • Inline image 1: A simple diagram of audio flowing from a browser to a speech-to-text service and back as text, with the audio file crossed out after transcription. Placement: after the "The voice question, answered specifically" section.
  • Inline image 2: A screenshot or wireframe of the dashboard Settings → Data panel with the Export and Delete buttons clearly labeled. Placement: inside the "Retention, export, and the delete button" section, above the ninety-second walkthrough.

Internal link suggestions

  • "The parent dashboard: what we show you and why" — anchor text: "the parent dashboard piece" (already used in the intro)
  • "What 'safe AI for kids' actually means (and what it doesn't)" — anchor text: "what safe AI for kids actually means" (suggest adding near the COPPA paragraph)
  • "A parent's framework for evaluating any AI tutor" — anchor text: "a parent's framework for evaluating any AI tutor" (suggest adding near the "honest part" closer)

Editor's note

Three things for Tim's review. (1) The Hudson parent meetup detail — confirm the city and that you are comfortable referencing it. (2) The thirty-day audio retention default for the optional "save audio for review" toggle — confirm this matches what is actually shipping in the dashboard, or flag the correct number. (3) The SOC 2 line is intentionally vulnerable; confirm you want to keep it in or soften to "we are working toward independent audit." [VERIFY] the exact ElevenLabs and speech-to-text data processing agreement language before linking; the elevenlabs.io/privacy link is the public privacy page and is stable, but the specific "no training on children's audio" claim should be quoted from the data processing agreement (DPA), not the public page.

One more thing —

Lumi is in open beta and free for the first 100 families. If reading time at your house ever feels harder than it should, we built this for you.