2025/12/30

Think-AI : AI-Voice-Suite: (開発案)v1.0.1

Version: 1.0.1 (Production Ready) Architecture: Serverless Monolith (Dockerized Lambda) Frontend: Next.js (Ghost Admin Customization)


  1. Requirements Overview
  • Overview of Serverless Monolith Pattern
  • Key Workflows (Interactive, Webhook, File STT)
  • Architecture Diagram Reference

2. Project Structure

  • Directory Tree Layout
  • Key File Descriptions

3. Source Code (Backend)

  • 3.1 Dependencies (package.json)
  • 3.2 Dockerfile (The Monolith Build Definition)
  • 3.3 Lambda Handlers (src/*.js)
    • A. LLM Agent (src/llm.js)
    • B. Translation (src/translate.js)
    • C. Text-to-Speech (src/tts.js)
    • D. Real-time STT Auth (src/stt-auth.js)
    • E. STT File Trigger (src/stt-file-api.js)
    • F. Ghost Webhook (src/ghost-webhook.js)

4. Deployment Steps (Backend)

  • Step 1: Create Makefile (Automation Script)
  • Step 2: Create Infrastructure (First Run Setup)
    • ECR Repository & Image Push
    • Lambda Creation & Configuration
    • API Gateway Routing

5. AWS Configuration Checklist

  • IAM Role Permissions (Polly, Translate, Transcribe, S3)
  • Environment Variables (Keys & Buckets)
  • Function URLs & CORS (Testing Configuration)

6. Frontend Integration (Next.js)

  • 6.1 API Configuration Strategy (Custom Domain vs. Lambda URL)
  • 6.2 Environment Setup (.env.local & Base URL)
  • 6.3 React Hook Implementation (useAiTools.ts - Unified Gateway Version)
  • 6.4 Custom Domain Setup
    • ACM Certificate Request
    • API Gateway Domain Mapping
    • DNS CNAME Configuration
  • 6.5 React Hook Reference (Alternative/Legacy Implementation)
  • 6.6 Test Component (AiTestConsole.tsx - Developer UI)

1. Requirements Overview

This project enhances a custom Next.js Admin Panel for Ghost CMS with a suite of AI capabilities, including real-time tools and automated background processing.

Interactive Features (Frontend Triggered)

  1. AI Assistant (LLM): Generate, expand, or summarize content via OpenAI (proxied).
  2. Text-to-Speech (TTS): Convert article text to audio (MP3) via Amazon Polly.
  3. Translation: Translate selected text instantly via Amazon Translate.
  4. Real-time Dictation (STT): Stream microphone audio directly to Amazon Transcribe via WebSocket for live transcription in the editor.
  5. File Transcription (STT - Two Modes):
    • Mode A (Upload): User uploads a new audio file to S3, triggering transcription automatically by S3 trigger.
    • Mode B (Existing File): User uploads a new audio file or selects an audio file already present, triggering transcription via API call without re-uploading (file already exists in S3).

Automated Features (Background Triggered)

  1. Post Auto-Processing (Webhook): When a post is "Published" in Ghost, a webhook triggers AWS Lambda to automatically generate a TTS audio version of the entire post via Amazon Polly.

2. Project Structure

Create this exact directory structure.

ai-voice-suite/
├── Makefile                   # Automation scripts
├── Dockerfile                 # The "Monolith" Image definition
├── package.json               # Backend dependencies
├── .env                       # Local secrets (AWS keys, OpenAI key)
└── src/
    ├── llm.js                 # OpenAI Text Gen
    ├── translate.js           # AWS Translate
    ├── tts.js                 # AWS Polly (Interactive)
    ├── stt-auth.js            # Real-time Mic Auth
    ├── stt-file-api.js        # API Trigger for File Transcription
    └── ghost-webhook.js       # Auto-TTS for Ghost Posts

3. Source Code (Copy & Paste)

3.1 Dependencies (package.json)

JSON

{
  "name": "ai-voice-suite",
  "version": "1.0.0",
  "main": "src/llm.js",
  "type": "commonjs",
  "dependencies": {
    "@aws-sdk/client-polly": "^3.400.0",
    "@aws-sdk/client-s3": "^3.400.0",
    "@aws-sdk/client-transcribe": "^3.400.0",
    "@aws-sdk/client-sts": "^3.400.0",
    "@aws-sdk/client-translate": "^3.400.0",
    "openai": "^4.0.0"
  }
}

3.2 Dockerfile (The Monolith)

Dockerfile

# Use AWS Lambda Node.js 20 Base Image
FROM public.ecr.aws/lambda/nodejs:20

# 1. Install Dependencies
COPY package.json ${LAMBDA_TASK_ROOT}
RUN npm install --omit=dev

# 2. Copy Source Code
COPY src/ ${LAMBDA_TASK_ROOT}/src/

# 3. Default Command (Will be overridden by AWS Lambda configuration)
CMD [ "src/llm.handler" ]

3.3 Lambda Handlers (src/*.js)

A. LLM Agent (src/llm.js)

JavaScript

const OpenAI = require("openai");
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

exports.handler = async (event) => {
    try {
        // Parse body from API Gateway (event.body is a string)
        const body = JSON.parse(event.body || "{}");
        const { prompt, task, context } = body;

        let systemMsg = "You are a helpful assistant for a blog editor.";
        if (task === 'expand') systemMsg = "Expand these notes into a professional paragraph.";
        if (task === 'headline') systemMsg = "Generate 5 SEO-friendly titles.";

        const completion = await openai.chat.completions.create({
            model: "gpt-4-turbo",
            messages: [
                { role: "system", content: systemMsg },
                { role: "user", content: `Context: ${context || ''}\n\nTask: ${prompt}` }
            ]
        });

        return {
            statusCode: 200,
            body: JSON.stringify({ result: completion.choices[0].message.content })
        };
    } catch (e) {
        return { statusCode: 500, body: JSON.stringify({ error: e.message }) };
    }
};

B. Translation (src/translate.js)

JavaScript

const { TranslateClient, TranslateTextCommand } = require("@aws-sdk/client-translate");
const client = new TranslateClient({ region: process.env.AWS_REGION });

exports.handler = async (event) => {
    try {
        const { text, targetLang } = JSON.parse(event.body || "{}");
        
        const command = new TranslateTextCommand({
            Text: text,
            SourceLanguageCode: "auto",
            TargetLanguageCode: targetLang || "es"
        });
        
        const response = await client.send(command);
        return { 
            statusCode: 200, 
            body: JSON.stringify({ translated: response.TranslatedText }) 
        };
    } catch (e) {
        return { statusCode: 500, body: JSON.stringify({ error: e.message }) };
    }
};

C. Text-to-Speech (src/tts.js)

JavaScript

const { PollyClient, SynthesizeSpeechCommand } = require("@aws-sdk/client-polly");
const { S3Client, PutObjectCommand } = require("@aws-sdk/client-s3");

const polly = new PollyClient({ region: process.env.AWS_REGION });
const s3 = new S3Client({ region: process.env.AWS_REGION });

exports.handler = async (event) => {
    try {
        const { text, voiceId } = JSON.parse(event.body || "{}");
        const bucket = process.env.AUDIO_BUCKET;
        const key = `tts-interactive/${Date.now()}.mp3`;

        const { AudioStream } = await polly.send(new SynthesizeSpeechCommand({
            Text: text,
            OutputFormat: "mp3",
            VoiceId: voiceId || "Joanna",
            Engine: "neural"
        }));

        // Convert stream to buffer
        const chunks = [];
        for await (const chunk of AudioStream) chunks.push(chunk);
        const buffer = Buffer.concat(chunks);

        await s3.send(new PutObjectCommand({ 
            Bucket: bucket, 
            Key: key, 
            Body: buffer, 
            ContentType: "audio/mpeg" 
        }));

        return { 
            statusCode: 200, 
            body: JSON.stringify({ url: `https://${bucket}.s3.amazonaws.com/${key}` }) 
        };
    } catch (e) {
        return { statusCode: 500, body: JSON.stringify({ error: e.message }) };
    }
};

D. Real-time STT Auth (src/stt-auth.js)

JavaScript

const { STSClient, AssumeRoleCommand } = require("@aws-sdk/client-sts");
const client = new STSClient({ region: process.env.AWS_REGION });

exports.handler = async () => {
    try {
        const command = new AssumeRoleCommand({
            RoleArn: process.env.STREAMING_ROLE_ARN,
            RoleSessionName: "GhostEditorUser",
            DurationSeconds: 900
        });
        const data = await client.send(command);
        
        return {
            statusCode: 200,
            body: JSON.stringify({
                accessKeyId: data.Credentials.AccessKeyId,
                secretAccessKey: data.Credentials.SecretAccessKey,
                sessionToken: data.Credentials.SessionToken,
                region: process.env.AWS_REGION
            })
        };
    } catch (e) {
        return { statusCode: 500, body: JSON.stringify({ error: e.message }) };
    }
};

E. STT File Trigger (src/stt-file-api.js)

JavaScript

const { TranscribeClient, StartTranscriptionJobCommand } = require("@aws-sdk/client-transcribe");
const transcribe = new TranscribeClient({ region: process.env.AWS_REGION });

function convertHttpToS3Uri(httpUrl) {
    try {
        const url = new URL(httpUrl);
        if (url.hostname.includes(".s3.")) {
             const bucket = url.hostname.split(".s3")[0];
             const key = url.pathname.substring(1);
             return `s3://${bucket}/${key}`;
        }
        return null;
    } catch (e) { return null; }
}

exports.handler = async (event) => {
    try {
        const { fileUrl, jobId } = JSON.parse(event.body || "{}");
        const s3Uri = convertHttpToS3Uri(fileUrl);
        
        if (!s3Uri) return { statusCode: 400, body: "Invalid S3 URL" };

        const jobName = jobId || `stt-manual-${Date.now()}`;
        
        await transcribe.send(new StartTranscriptionJobCommand({
            TranscriptionJobName: jobName,
            LanguageCode: "en-US",
            Media: { MediaFileUri: s3Uri },
            OutputBucketName: process.env.TRANSCRIPTS_BUCKET,
            OutputKey: `transcripts/${jobName}.json`
        }));

        return { statusCode: 200, body: JSON.stringify({ message: "Job Started", jobName }) };
    } catch (e) {
        return { statusCode: 500, body: JSON.stringify({ error: e.message }) };
    }
};

F. Ghost Webhook (src/ghost-webhook.js)

JavaScript

const { PollyClient, SynthesizeSpeechCommand } = require("@aws-sdk/client-polly");
const { S3Client, PutObjectCommand } = require("@aws-sdk/client-s3");
const polly = new PollyClient({ region: process.env.AWS_REGION });
const s3 = new S3Client({ region: process.env.AWS_REGION });

const stripHtml = (html) => html.replace(/<[^>]*>?/gm, '');

exports.handler = async (event) => {
    try {
        console.log("Ghost Webhook Triggered");
        const body = JSON.parse(event.body || "{}");
        const post = body.post?.current;

        if (!post || !post.html) return { statusCode: 200, body: "No content" };

        const text = stripHtml(post.html).substring(0, 2999);
        const key = `posts-audio/${post.slug}.mp3`;

        const { AudioStream } = await polly.send(new SynthesizeSpeechCommand({
            Text: text, OutputFormat: "mp3", VoiceId: "Matthew", Engine: "neural"
        }));

        const chunks = [];
        for await (const chunk of AudioStream) chunks.push(chunk);
        
        await s3.send(new PutObjectCommand({ 
            Bucket: process.env.AUDIO_BUCKET, Key: key, Body: Buffer.concat(chunks), ContentType: "audio/mpeg" 
        }));

        return { statusCode: 200, body: "Audio Generated" };
    } catch (e) {
        console.error(e);
        return { statusCode: 200, body: JSON.stringify({ error: e.message }) };
    }
};

4. Deployment Steps

Step 1: Create Makefile

Copy this content into your Makefile. Update AWS_ACCOUNT_ID and REGION.

Makefile

AWS_REGION = us-east-1
AWS_ACCOUNT_ID = 123456789012
ECR_URI = $(AWS_ACCOUNT_ID).dkr.ecr.$(AWS_REGION).amazonaws.com
IMAGE_NAME = ai-voice-suite

# 1. Login to ECR
login:
	aws ecr get-login-password --region $(AWS_REGION) | docker login --username AWS --password-stdin $(ECR_URI)

# 2. Build & Push Image
deploy-image:
	docker build -t $(IMAGE_NAME) .
	docker tag $(IMAGE_NAME):latest $(ECR_URI)/$(IMAGE_NAME):latest
	docker push $(ECR_URI)/$(IMAGE_NAME):latest

# 3. Update Lambda Codes (Points all functions to the new image)
update-lambdas:
	aws lambda update-function-code --function-name ai-llm --image-uri $(ECR_URI)/$(IMAGE_NAME):latest
	aws lambda update-function-code --function-name ai-translate --image-uri $(ECR_URI)/$(IMAGE_NAME):latest
	aws lambda update-function-code --function-name ai-tts --image-uri $(ECR_URI)/$(IMAGE_NAME):latest
	aws lambda update-function-code --function-name ai-stt-auth --image-uri $(ECR_URI)/$(IMAGE_NAME):latest
	aws lambda update-function-code --function-name ai-stt-file --image-uri $(ECR_URI)/$(IMAGE_NAME):latest
	aws lambda update-function-code --function-name ai-ghost-webhook --image-uri $(ECR_URI)/$(IMAGE_NAME):latest

deploy: login deploy-image update-lambdas

Step 2: Create Infrastructure (First Run Only)

  1. Create Lambda Functions (AWS Console):
    • Create 6 functions (ai-llm, ai-translate, etc.).
    • Select Container Image -> Browse ECR -> Select ai-voice-suite.
    • Crucial: For each function, go to Image Configuration > Edit > CMD Override:
      • ai-llmsrc/llm.handler
      • ai-translatesrc/translate.handler
      • ai-ghost-webhooksrc/ghost-webhook.handler
      • (Repeat for others matching the filename).
  2. Configure API Gateway:
    • Create HTTP API.
    • Create Routes (e.g., POST /ai/llm) pointing to the respective Lambda functions.

Push Initial Image:Bash

make deploy-image

Create ECR Repository:Bash

aws ecr create-repository --repository-name ai-voice-suite

This video provides an excellent walkthrough on setting up AWS Lambda with container images (Docker), which perfectly matches the deployment strategy outlined in this document:How to Deploy Docker Container to AWS Lambda.

How to develop AWS Lambda locally with SAM, Docker, Python and VSCode - Part 3Cloud With Girish · 219 回の視聴

5. AWS Configuration Checklist

Before testing, ensure these are set in AWS Console:

  1. IAM Role: The Lambda Role must have permissions: polly:*, translate:*, transcribe:*, s3:PutObject.
  2. Environment Variables (Lambda):
    • OPENAI_API_KEY: (Your sk-...)
    • AUDIO_BUCKET: (Your S3 bucket name)
    • STREAMING_ROLE_ARN: (ARN for STT WebSocket Auth)
  3. Function URLs: Enable "Auth: NONE" and "CORS: *" for easiest testing from localhost.

6. Frontend Integration (Next.js)

6.1 API Configuration

Store your Lambda Function URLs in your Next.js .env.local:

To switch from individual Lambda URLs to your custom domain 60-think.com (via AWS API Gateway), you typically create a subdomain like api.60-think.com or ai.60-think.com.

Here is the updated configuration and code.

6.2 Updated .env.local

Instead of 4 separate URLs, you now have one Base URL.

Bash

# .env.local

# The Base URL for your Custom Domain on API Gateway
NEXT_PUBLIC_AI_GATEWAY_URL="https://api.60-think.com"

# Optional: If you haven't set up the subdomain yet, use the raw AWS URL:
# NEXT_PUBLIC_AI_GATEWAY_URL="https://xyz123.execute-api.us-east-1.amazonaws.com"

6.3 Updated React Hook (useAiTools.ts)

Update your hook to append the specific routes (/ai/llm, /ai/tts, etc.) to that single Base URL.

TypeScript

import { useState } from 'react';

export function useAiTools() {
  const [loading, setLoading] = useState(false);
  
  // 1. Get the Base URL
  const BASE_URL = process.env.NEXT_PUBLIC_AI_GATEWAY_URL;

  const callApi = async (endpoint: string, payload: object, method = 'POST') => {
    setLoading(true);
    try {
      // 2. Construct full URL: https://api.60-think.com/ai/llm
      const url = `${BASE_URL}${endpoint}`;
      
      const res = await fetch(url, {
        method,
        headers: { 'Content-Type': 'application/json' },
        body: method === 'POST' ? JSON.stringify(payload) : undefined
      });
      
      if (!res.ok) throw new Error(await res.text());
      return await res.json();
    } catch (err) {
      console.error(err);
      throw err;
    } finally {
      setLoading(false);
    }
  };

  // 3. Define methods with specific routes
  const generateText = (prompt: string, task: string, context?: string) => 
    callApi('/ai/llm', { prompt, task, context });

  const translateText = (text: string, targetLang: string) => 
    callApi('/ai/translate', { text, targetLang });

  const generateSpeech = (text: string) => 
    callApi('/ai/tts', { text });

  const getMicAuth = () => 
    callApi('/auth/stt', {}, 'GET');

  return { loading, generateText, translateText, generateSpeech, getMicAuth };
}

6.4 Critical Setup: Connecting the Domain in AWS

Just writing api.60-think.com in your code won't work until you configure AWS. You must Map the domain to the API Gateway.

  1. Request a Certificate:
    • Go to AWS Certificate Manager (ACM) -> Request a certificate for api.60-think.com.
    • Validate it (DNS validation via Route53 or your DNS provider).
  2. Create Custom Domain Name:
    • Go to API Gateway Console -> Custom domain names -> Create.
    • Domain name: api.60-think.com.
    • ACM Certificate: Select the one you just created.
  3. Map to API:
    • Click the "API mappings" tab.
    • API: Select GhostAI_Gateway.
    • Stage: $default.
  4. Update DNS:
    • AWS will give you a API Gateway domain name (e.g., d-xyz.execute-api.us-east-1.amazonaws.com).
    • Go to your DNS provider (where 60-think.com is managed).
    • Create a CNAME record: api pointing to d-xyz.execute-api....

6.5 React Hook: useAiTools.ts

This hook abstracts all backend communication.

TypeScript

import { useState } from 'react';

export function useAiTools() {
  const [loading, setLoading] = useState(false);

  // Helper to fetch generic JSON endpoints
  const callApi = async (url: string, payload: object) => {
    setLoading(true);
    try {
      const res = await fetch(url, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify(payload)
      });
      if (!res.ok) throw new Error(await res.text());
      return await res.json();
    } catch (err) {
      console.error(err);
      throw err;
    } finally {
      setLoading(false);
    }
  };

  const generateText = (prompt: string, task: 'expand' | 'headline', context?: string) => 
    callApi(process.env.NEXT_PUBLIC_API_LLM!, { prompt, task, context });

  const translateText = (text: string, targetLang: string) => 
    callApi(process.env.NEXT_PUBLIC_API_TRANSLATE!, { text, targetLang });

  const generateSpeech = (text: string) => 
    callApi(process.env.NEXT_PUBLIC_API_TTS!, { text });

  // For Real-time STT, you just fetch credentials
  const getMicAuth = () => callApi(process.env.NEXT_PUBLIC_API_AUTH!, {});

  return { loading, generateText, translateText, generateSpeech, getMicAuth };
}

6.6 Test Component: AiTestConsole.tsx

TypeScript

import { useState } from 'react';
import { useAiTools } from '../hooks/useAiTools';

export default function AiTestConsole() {
  const { loading, generateText, translateText } = useAiTools();
  const [input, setInput] = useState("");
  const [result, setResult] = useState("");

  const handleExpand = async () => {
    const res = await generateText(input, "expand");
    setResult(res.result);
  };

  const handleTranslate = async () => {
    const res = await translateText(input, "fr"); // Translate to French
    setResult(res.translated);
  };

  return (
    <div className="p-4 border rounded bg-gray-50">
      <h3 className="font-bold">AI Developer Console</h3>
      <textarea 
        className="w-full p-2 border mt-2" 
        value={input} 
        onChange={e => setInput(e.target.value)}
        placeholder="Type content here..."
      />
      
      <div className="flex gap-2 mt-2">
        <button onClick={handleExpand} disabled={loading} className="bg-blue-600 text-white px-4 py-2 rounded">
          {loading ? "Thinking..." : "Expand Text (LLM)"}
        </button>
        <button onClick={handleTranslate} disabled={loading} className="bg-green-600 text-white px-4 py-2 rounded">
          Translate to FR
        </button>
      </div>

      {result && (
        <div className="mt-4 p-2 bg-white border">
          <strong>Result:</strong>
          <p>{result}</p>
        </div>
      )}
    </div>
  );
}