MIR-02: Privacy-Constrained Adaptive Dialogue Systems for Early Childhood AI Companions—Architecture, Evaluation, and Compliance Frameworks

Brief description of the organization

Mirie is an early-stage B2B platform that enables children's media IP holders to deploy voice-interactive AI companion experiences in any language, in 8 weeks, with a privacy architecture that routes no child voice data to third-party inference APIs. The platform's defining technical constraint is also its research contribution: building adaptive, emotionally-responsive dialogue systems for children ages 2–6 while operating under COPPA/PIPEDA compliance requirements that prohibit the data collection pipelines most modern ML systems depend on. Our proof-of-concept deployment is DHFH (Double Heart Full Hall), a bilingual Mandarin-English companion app for Chinese-Canadian children.


Problem area

The deployment of conversational AI in child-facing applications sits at the intersection of three under-researched problems in ML systems design.

  • Adaptive dialogue under strict data constraints. Modern adaptive dialogue systems, including RLHF-tuned LLMs and preference-learning models, assume the ability to log interaction data, collect user feedback signals, and fine-tune on deployment. COPPA (and its April 2026 amendments) prohibits collecting personal information from children under 13 without verifiable parental consent, with per-violation fines now reaching $51,744 USD. This eliminates the standard feedback loop. The open research question: can an adaptive conversational system improve response quality and maintain child engagement over a session without retaining user-level data between sessions? This requires rethinking how adaptation operates: from user-level personalization to population-level policy optimization, on-device inference, or privacy-preserving aggregation techniques such as federated learning with differential privacy guarantees.
  • Child-directed speech (CDS) as an out-of-distribution NLP problem. Children ages 2–6 produce speech and text that differs systematically from adult corpora on which most NLP models are trained: shorter utterances, higher disfluency rates, non-standard phonology, code-switching (particularly in heritage language households), and pragmatic patterns that diverge from adult conversation norms. Existing evaluation benchmarks (GLUE, SuperGLUE, BabyLM) do not reflect the distributional properties of spontaneous child speech in interactive, character-mediated contexts. A robust evaluation framework for CDS-capable dialogue systems does not yet exist.
  • Measurement of learning-adjacent outcomes in interactive media. Neuroscience research (Romeo et al., 2018, Psychological Science) establishes that conversational turn count, not exposure duration, predicts white-matter connectivity in Broca's area and Wernicke's area, independent of socioeconomic background. Randomized controlled trials (Roseberry et al., 2011, JECP; N=77, ages 4–6) demonstrate that contingent character interaction produces significantly better comprehension, vocalization, and character trust than passive exposure to identical content. Yet no standardized computational proxy exists for "contingent quality" of a dialogue turn in child-facing AI systems; only human-annotated measures are currently used. Building an automated metric grounded in these RCT findings is an open problem at the intersection of NLP, developmental psychology, and learning analytics.

Main objectives

  • Design and evaluate a privacy-preserving adaptation architecture for a child-facing dialogue system that improves response quality and engagement over a session without logging child voice, retaining user-level data, or requiring cloud-side personalization. Candidate approaches include on-device contextual bandits, federated learning with (ε, δ)-differential privacy guarantees, and session-scoped state machines with population-level priors.

  • Construct a child-directed speech evaluation benchmark for short-turn, character-mediated dialogue in bilingual (Mandarin-English) contexts, using existing CDS corpora (CHILDES, CASANA) as a reference distribution and synthetic augmentation for low-resource Mandarin child speech.

  • Develop a computational proxy metric for "contingent turn quality," operationalizing the Roseberry et al. contingency construct as a measurable property of a system-generated dialogue turn, and validate it against human annotation of existing DHFH interaction logs.

  • Produce a compliance-mapped technical architecture document that demonstrates how each data flow in the system satisfies COPPA §312.2 (personal information), §312.5 (parental consent), and §312.8 (data retention), structured as a reusable framework for any children's AI product deployment.


Scope of work

  • Literature review across four domains: (a) privacy-preserving ML, including federated learning, differential privacy, and on-device inference (McMahan et al., 2017; Dwork & Roth, 2014); (b) child-directed speech NLP, including CDS corpora, BabyLM Challenge findings, and phonological adaptation in ASR; (c) adaptive dialogue systems, including contextual bandits, RLHF, and persona-conditioned generation; (d) developmental psychology of parasocial learning and contingent interaction.
  • Threat model and data flow audit — map every data touchpoint in the current Mirie architecture against COPPA/PIPEDA requirements; identify where standard ML pipelines create compliance exposure and formally specify the privacy constraints the system must satisfy.
  • Architecture design and prototype — implement at least two candidate adaptation mechanisms (e.g., on-device session-scoped RL vs. federated population-level update) integrated with the existing React Native / Expo / Supabase stack; measure response latency, adaptation quality, and privacy budget consumption.
  • CDS benchmark construction — sample and annotate 500–1,000 dialogue turns from CHILDES and CASANA; define evaluation criteria for bilingual CDS; fine-tune or prompt-adapt a base language model and evaluate against the benchmark.
  • Contingency metric development — operationalize "contingent turn quality" from the Roseberry et al. construct; train a classifier or regression model on human-annotated DHFH interaction logs; report inter-annotator agreement and correlation with engagement proxy signals.
  • Ablation study and evaluation — compare contingency metric scores, engagement proxies (turn count, session length, return rate), and adaptation mechanism performance across conditions; produce a results report structured for potential conference submission (CHI, ACL, FAccT, or IDC).
  • Compliance framework documentation — produce a reusable technical spec mapping system architecture to regulatory requirements, formatted for brand partner legal due diligence.

Deliverables

  • Report
  • Presentation
  • New protocols/processes
  • Other

Team meeting frequency

Bi-weekly.


Skills and training required

  • Core: machine learning (NLP or dialogue systems preferred); Python (PyTorch or JAX); familiarity with privacy-preserving ML concepts (differential privacy, federated learning) or strong interest in developing it.
  • Valuable: experience with on-device ML (TensorFlow Lite, Core ML, ONNX); React Native or mobile systems; ASR pipelines (Whisper-family models); child language acquisition or HCI research background. Presentation and technical writing skills essential the project produces research artifacts, not just code.

Resources required 

  • Access to Mirie's codebase via private GitHub — provided
  • Anthropic API access for LLM layer — provided
  • CHILDES corpus access (publicly available via TalkBank)
  • CASANA corpus (accessible via academic licensing)
  • GPU compute for model training — Mirie will provide cloud compute budget (AWS/GCP) or the team may apply for Waterloo's Compute Canada allocation
  • iOS/Android test devices for on-device inference benchmarking — provided

NDA or a commercialization agreement for this project?

Yes