2026/03/07
AI Agent Vision of Think-AI
AI Agent Vision of Think-AI

11. AI Agent Vision & Delivery Plan (2026-03-07)

Vision

Build a production-grade AI assistant that acts as a real agent for this product:

  • understands live frontend context (home, editor, article, viewer)
  • answers questions from trusted project knowledge (Ghost posts/pages/files) with citations
  • can execute approved tools and MCP tools safely
  • supports controlled content workflows (draft assist first, write actions with confirmation)

Target Architecture

  1. Keep /api/ai/chat/ui as the single chat entrypoint.
  2. Add agent runtime modules:
    • runtime (orchestration)
    • tools (local app tools)
    • mcp (MCP connectors + allowlist)
    • rag (indexing/retrieval/rerank)
    • policy (auth, role, scope, safety)
  3. Keep existing usage logging and extend with tool/retrieval counts.

Execution Plan (Milestones)

  1. M1: Frontend Context-Aware Chat

    • Pass page_context from frontend on each request.
    • Include route type, entity IDs, group scope, locale, optional editor draft snapshot.
    • Agent answers based on current page even without RAG.
  2. M2: Local Tool Calling (Read-only first)

    • Implement typed local tools with zod.
    • First tools: get_current_page_context, search_posts, get_post_by_id, get_page_by_id.
    • Enforce auth and group scope in tool layer.
  3. M3: RAG for Ghost Content

    • Build ingestion pipeline for posts/pages/files.
    • Chunk + embed + store vectors with metadata (group_id, visibility, updated_at, locale).
    • Add retriever + reranker and citation output.
  4. M4: MCP Integration

    • Add MCP server registry + tool allowlist policy.
    • Add timeout/retry/circuit-breaker.
    • Add audit logs for each MCP call.
  5. M5: Safe Action Tools

    • Add draft-assist tools first.
    • Add explicit user confirmation for mutations.
    • Keep publish/delete behind stricter role policy.
  6. M6: UX + Operations

    • Add agent mode selector in SiteAssistantPanel.
    • Add tool trace + citations panel.
    • Add admin controls for provider/model/tool toggles.

Step Tasks (Run One by One)

  1. Task A: Context Envelope

    • Frontend: include page_context in chat request body.
    • Backend: parse and validate page_context.
    • Done when assistant can answer "what page am I on now?"
  2. Task B: Tool Registry

    • Add tool registry and run tool-calling loop.
    • Implement first read-only tools.
    • Done when assistant can fetch current post/page details via tool calls.
  3. Task C: RAG MVP

    • Add indexing job + retrieval endpoint/tool.
    • Return citations in AI response.
    • Done when assistant answers content questions with source links/snippets.
  4. Task D: MCP MVP

    • Add one MCP server with allowlisted tools.
    • Add logs and failure handling.
    • Done when one MCP tool can be called from agent safely.
  5. Task E: Safe Write Actions

    • Add draft-editing tools behind confirmation.
    • Done when assistant can propose/apply draft changes without direct publish.

Suggested Repo File Layout

  • Frontend:
    • apps/host/src/components/ai/SiteAssistantPanel.tsx (agent mode UI, traces)
    • apps/host/src/app/ClientLayout.tsx (page context provider)
    • apps/host/src/types/ai.ts (agent request/response types)
  • Backend API:
    • apps/host/src/app/api/ai/chat/ui/route.ts (entrypoint)
    • apps/host/src/app/api/ai/agent/runtime.ts
    • apps/host/src/app/api/ai/agent/tools/*
    • apps/host/src/app/api/ai/agent/rag/*
    • apps/host/src/app/api/ai/agent/mcp/*
  • Ghost backend (if needed for persistent indexing/audit):
    • add dedicated endpoints/models for RAG docs/chunks and tool logs.

Definition of Done

  1. Assistant is context-aware per page and group.
  2. Assistant answers project content questions with citations.
  3. Tool + MCP calls are permissioned and audited.
  4. No cross-group leakage.
  5. Usage/quotas include source + tool/retrieval visibility.

12. Voice Agent Vision (Senior Care Helper) (2026-03-07)

Vision Extension

Add voice interaction so users, especially senior people, can talk naturally with the assistant for daily support:

  • easy spoken interaction (hands-free, large-button UX)
  • medication and routine reminders
  • simple wellbeing check-ins
  • practical life guidance with calm, short responses
  • safe escalation guidance for urgent situations

Scope and Safety Boundaries

  1. This is a support assistant, not a medical diagnosis system.
  2. For emergency symptoms (e.g., chest pain, breathing difficulty, stroke signs), always advise immediate local emergency contact.
  3. Health suggestions must be conservative, explain uncertainty, and recommend consulting licensed professionals.
  4. Never claim to replace doctors or prescribe treatment plans autonomously.

Voice Architecture

  1. Input: microphone -> STT endpoint (/api/ai/stt).
  2. Agent runtime: same orchestration path as text (/api/ai/chat/ui with agent mode).
  3. Output: assistant text -> TTS endpoint (/api/ai/tts) -> playback.
  4. Optional: duplex/realtime mode later via /api/ai/realtime/session.

Senior-Friendly UX Requirements

  1. Large touch targets for Speak, Stop, Repeat, Help.
  2. Slow/clear TTS options:
    • speech speed presets (slow, normal)
    • male/female voice options
  3. Confirmation flow for important actions:
    • "Do you want me to set this reminder now?"
  4. One-tap "Call family/help contact" shortcut (if enabled by user settings).
  5. Conversation summaries in simple language.

Initial Voice Use Cases (Phase 1)

  1. Daily reminders:
    • medication time
    • hydration
    • sleep routine
  2. General support:
    • explain article/page content by voice
    • answer "what did I write/publish?"
  3. Gentle wellbeing prompts:
    • mood check-in
    • activity reminder

Implementation Tasks (Voice Track)

  1. V1: Voice UI in Assistant Panel

    • add mic button and recording state in SiteAssistantPanel.tsx
    • send audio to /api/ai/stt
    • inject transcript into message input
  2. V2: Speak Back Responses

    • add "play response" button
    • call /api/ai/tts for assistant text
    • add stop/replay controls
  3. V3: Voice Agent Mode

    • add interaction_mode: "text" | "voice" in request body/types
    • tune prompts for short spoken responses
  4. V4: Safety Prompt Pack for Senior Care

    • add safety system instructions for health-related prompts
    • add emergency trigger phrases and safe fallback responses
  5. V5: Reminder Tools

    • add local tools: create_reminder, list_reminders, cancel_reminder
    • require explicit confirmation before save/delete
  6. V6: Observability

    • log STT/TTS latency and failures
    • include voice usage in quotas/usage dashboard

Additional Definition of Done (Voice)

  1. User can complete a full voice roundtrip (speak -> answer spoken back).
  2. Response style is concise and clear for spoken comprehension.
  3. Emergency health queries always return safe escalation guidance.
  4. Reminder actions require confirmation and are auditable.