60-think|Senior blog, CMS and SNS community site

2025/12/30

Think-AI : AI-Voice-Suite: （開発案）v1.0.1

Version: 1.0.1 (Production Ready) Architecture: Serverless Monolith (Dockerized Lambda) Frontend: Next.js (Ghost Admin Customization)

Requirements Overview

Overview of Serverless Monolith Pattern
Key Workflows (Interactive, Webhook, File STT)
Architecture Diagram Reference

2. Project Structure

Directory Tree Layout
Key File Descriptions

3. Source Code (Backend)

3.1 Dependencies (package.json)
3.2 Dockerfile (The Monolith Build Definition)
3.3 Lambda Handlers (src/*.js)
- A. LLM Agent (src/llm.js)
- B. Translation (src/translate.js)
- C. Text-to-Speech (src/tts.js)
- D. Real-time STT Auth (src/stt-auth.js)
- E. STT File Trigger (src/stt-file-api.js)
- F. Ghost Webhook (src/ghost-webhook.js)

4. Deployment Steps (Backend)

Step 1: Create Makefile (Automation Script)
Step 2: Create Infrastructure (First Run Setup)
- ECR Repository & Image Push
- Lambda Creation & Configuration
- API Gateway Routing

5. AWS Configuration Checklist

IAM Role Permissions (Polly, Translate, Transcribe, S3)
Environment Variables (Keys & Buckets)
Function URLs & CORS (Testing Configuration)

6. Frontend Integration (Next.js)

6.1 API Configuration Strategy (Custom Domain vs. Lambda URL)
6.2 Environment Setup (.env.local & Base URL)
6.3 React Hook Implementation (useAiTools.ts - Unified Gateway Version)
6.4 Custom Domain Setup
- ACM Certificate Request
- API Gateway Domain Mapping
- DNS CNAME Configuration
6.5 React Hook Reference (Alternative/Legacy Implementation)
6.6 Test Component (AiTestConsole.tsx - Developer UI)

1. Requirements Overview

This project enhances a custom Next.js Admin Panel for Ghost CMS with a suite of AI capabilities, including real-time tools and automated background processing.

Interactive Features (Frontend Triggered)

AI Assistant (LLM): Generate, expand, or summarize content via OpenAI (proxied).
Text-to-Speech (TTS): Convert article text to audio (MP3) via Amazon Polly.
Translation: Translate selected text instantly via Amazon Translate.
Real-time Dictation (STT): Stream microphone audio directly to Amazon Transcribe via WebSocket for live transcription in the editor.
File Transcription (STT - Two Modes):
- Mode A (Upload): User uploads a new audio file to S3, triggering transcription automatically by S3 trigger.
- Mode B (Existing File): User uploads a new audio file or selects an audio file already present, triggering transcription via API call without re-uploading (file already exists in S3).

Automated Features (Background Triggered)

Post Auto-Processing (Webhook): When a post is "Published" in Ghost, a webhook triggers AWS Lambda to automatically generate a TTS audio version of the entire post via Amazon Polly.

2. Project Structure

Create this exact directory structure.

ai-voice-suite/
├── Makefile                   # Automation scripts
├── Dockerfile                 # The "Monolith" Image definition
├── package.json               # Backend dependencies
├── .env                       # Local secrets (AWS keys, OpenAI key)
└── src/
    ├── llm.js                 # OpenAI Text Gen
    ├── translate.js           # AWS Translate
    ├── tts.js                 # AWS Polly (Interactive)
    ├── stt-auth.js            # Real-time Mic Auth
    ├── stt-file-api.js        # API Trigger for File Transcription
    └── ghost-webhook.js       # Auto-TTS for Ghost Posts

3. Source Code (Copy & Paste)

3.1 Dependencies (`package.json`)

JSON

{
  "name": "ai-voice-suite",
  "version": "1.0.0",
  "main": "src/llm.js",
  "type": "commonjs",
  "dependencies": {
    "@aws-sdk/client-polly": "^3.400.0",
    "@aws-sdk/client-s3": "^3.400.0",
    "@aws-sdk/client-transcribe": "^3.400.0",
    "@aws-sdk/client-sts": "^3.400.0",
    "@aws-sdk/client-translate": "^3.400.0",
    "openai": "^4.0.0"
  }
}

3.2 Dockerfile (The Monolith)

Dockerfile

# Use AWS Lambda Node.js 20 Base Image
FROM public.ecr.aws/lambda/nodejs:20

# 1. Install Dependencies
COPY package.json ${LAMBDA_TASK_ROOT}
RUN npm install --omit=dev

# 2. Copy Source Code
COPY src/ ${LAMBDA_TASK_ROOT}/src/

# 3. Default Command (Will be overridden by AWS Lambda configuration)
CMD [ "src/llm.handler" ]

**3.3 Lambda Handlers (`src/*.js`)**

A. LLM Agent (`src/llm.js`)

JavaScript

const OpenAI = require("openai");
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

exports.handler = async (event) => {
    try {
        // Parse body from API Gateway (event.body is a string)
        const body = JSON.parse(event.body || "{}");
        const { prompt, task, context } = body;

        let systemMsg = "You are a helpful assistant for a blog editor.";
        if (task === 'expand') systemMsg = "Expand these notes into a professional paragraph.";
        if (task === 'headline') systemMsg = "Generate 5 SEO-friendly titles.";

        const completion = await openai.chat.completions.create({
            model: "gpt-4-turbo",
            messages: [
                { role: "system", content: systemMsg },
                { role: "user", content: `Context: ${context || ''}\n\nTask: ${prompt}` }
            ]
        });

        return {
            statusCode: 200,
            body: JSON.stringify({ result: completion.choices[0].message.content })
        };
    } catch (e) {
        return { statusCode: 500, body: JSON.stringify({ error: e.message }) };
    }
};

B. Translation (`src/translate.js`)

JavaScript

const { TranslateClient, TranslateTextCommand } = require("@aws-sdk/client-translate");
const client = new TranslateClient({ region: process.env.AWS_REGION });

exports.handler = async (event) => {
    try {
        const { text, targetLang } = JSON.parse(event.body || "{}");
        
        const command = new TranslateTextCommand({
            Text: text,
            SourceLanguageCode: "auto",
            TargetLanguageCode: targetLang || "es"
        });
        
        const response = await client.send(command);
        return { 
            statusCode: 200, 
            body: JSON.stringify({ translated: response.TranslatedText }) 
        };
    } catch (e) {
        return { statusCode: 500, body: JSON.stringify({ error: e.message }) };
    }
};

C. Text-to-Speech (`src/tts.js`)

JavaScript

const { PollyClient, SynthesizeSpeechCommand } = require("@aws-sdk/client-polly");
const { S3Client, PutObjectCommand } = require("@aws-sdk/client-s3");

const polly = new PollyClient({ region: process.env.AWS_REGION });
const s3 = new S3Client({ region: process.env.AWS_REGION });

exports.handler = async (event) => {
    try {
        const { text, voiceId } = JSON.parse(event.body || "{}");
        const bucket = process.env.AUDIO_BUCKET;
        const key = `tts-interactive/${Date.now()}.mp3`;

        const { AudioStream } = await polly.send(new SynthesizeSpeechCommand({
            Text: text,
            OutputFormat: "mp3",
            VoiceId: voiceId || "Joanna",
            Engine: "neural"
        }));

        // Convert stream to buffer
        const chunks = [];
        for await (const chunk of AudioStream) chunks.push(chunk);
        const buffer = Buffer.concat(chunks);

        await s3.send(new PutObjectCommand({ 
            Bucket: bucket, 
            Key: key, 
            Body: buffer, 
            ContentType: "audio/mpeg" 
        }));

        return { 
            statusCode: 200, 
            body: JSON.stringify({ url: `https://${bucket}.s3.amazonaws.com/${key}` }) 
        };
    } catch (e) {
        return { statusCode: 500, body: JSON.stringify({ error: e.message }) };
    }
};

D. Real-time STT Auth (`src/stt-auth.js`)

JavaScript

const { STSClient, AssumeRoleCommand } = require("@aws-sdk/client-sts");
const client = new STSClient({ region: process.env.AWS_REGION });

exports.handler = async () => {
    try {
        const command = new AssumeRoleCommand({
            RoleArn: process.env.STREAMING_ROLE_ARN,
            RoleSessionName: "GhostEditorUser",
            DurationSeconds: 900
        });
        const data = await client.send(command);
        
        return {
            statusCode: 200,
            body: JSON.stringify({
                accessKeyId: data.Credentials.AccessKeyId,
                secretAccessKey: data.Credentials.SecretAccessKey,
                sessionToken: data.Credentials.SessionToken,
                region: process.env.AWS_REGION
            })
        };
    } catch (e) {
        return { statusCode: 500, body: JSON.stringify({ error: e.message }) };
    }
};

E. STT File Trigger (`src/stt-file-api.js`)

JavaScript

const { TranscribeClient, StartTranscriptionJobCommand } = require("@aws-sdk/client-transcribe");
const transcribe = new TranscribeClient({ region: process.env.AWS_REGION });

function convertHttpToS3Uri(httpUrl) {
    try {
        const url = new URL(httpUrl);
        if (url.hostname.includes(".s3.")) {
             const bucket = url.hostname.split(".s3")[0];
             const key = url.pathname.substring(1);
             return `s3://${bucket}/${key}`;
        }
        return null;
    } catch (e) { return null; }
}

exports.handler = async (event) => {
    try {
        const { fileUrl, jobId } = JSON.parse(event.body || "{}");
        const s3Uri = convertHttpToS3Uri(fileUrl);
        
        if (!s3Uri) return { statusCode: 400, body: "Invalid S3 URL" };

        const jobName = jobId || `stt-manual-${Date.now()}`;
        
        await transcribe.send(new StartTranscriptionJobCommand({
            TranscriptionJobName: jobName,
            LanguageCode: "en-US",
            Media: { MediaFileUri: s3Uri },
            OutputBucketName: process.env.TRANSCRIPTS_BUCKET,
            OutputKey: `transcripts/${jobName}.json`
        }));

        return { statusCode: 200, body: JSON.stringify({ message: "Job Started", jobName }) };
    } catch (e) {
        return { statusCode: 500, body: JSON.stringify({ error: e.message }) };
    }
};

F. Ghost Webhook (`src/ghost-webhook.js`)

JavaScript

const { PollyClient, SynthesizeSpeechCommand } = require("@aws-sdk/client-polly");
const { S3Client, PutObjectCommand } = require("@aws-sdk/client-s3");
const polly = new PollyClient({ region: process.env.AWS_REGION });
const s3 = new S3Client({ region: process.env.AWS_REGION });

const stripHtml = (html) => html.replace(/<[^>]*>?/gm, '');

exports.handler = async (event) => {
    try {
        console.log("Ghost Webhook Triggered");
        const body = JSON.parse(event.body || "{}");
        const post = body.post?.current;

        if (!post || !post.html) return { statusCode: 200, body: "No content" };

        const text = stripHtml(post.html).substring(0, 2999);
        const key = `posts-audio/${post.slug}.mp3`;

        const { AudioStream } = await polly.send(new SynthesizeSpeechCommand({
            Text: text, OutputFormat: "mp3", VoiceId: "Matthew", Engine: "neural"
        }));

        const chunks = [];
        for await (const chunk of AudioStream) chunks.push(chunk);
        
        await s3.send(new PutObjectCommand({ 
            Bucket: process.env.AUDIO_BUCKET, Key: key, Body: Buffer.concat(chunks), ContentType: "audio/mpeg" 
        }));

        return { statusCode: 200, body: "Audio Generated" };
    } catch (e) {
        console.error(e);
        return { statusCode: 200, body: JSON.stringify({ error: e.message }) };
    }
};

4. Deployment Steps

Step 1: Create Makefile

Copy this content into your Makefile. Update AWS_ACCOUNT_ID and REGION.

Makefile

AWS_REGION = us-east-1
AWS_ACCOUNT_ID = 123456789012
ECR_URI = $(AWS_ACCOUNT_ID).dkr.ecr.$(AWS_REGION).amazonaws.com
IMAGE_NAME = ai-voice-suite

# 1. Login to ECR
login:
	aws ecr get-login-password --region $(AWS_REGION) | docker login --username AWS --password-stdin $(ECR_URI)

# 2. Build & Push Image
deploy-image:
	docker build -t $(IMAGE_NAME) .
	docker tag $(IMAGE_NAME):latest $(ECR_URI)/$(IMAGE_NAME):latest
	docker push $(ECR_URI)/$(IMAGE_NAME):latest

# 3. Update Lambda Codes (Points all functions to the new image)
update-lambdas:
	aws lambda update-function-code --function-name ai-llm --image-uri $(ECR_URI)/$(IMAGE_NAME):latest
	aws lambda update-function-code --function-name ai-translate --image-uri $(ECR_URI)/$(IMAGE_NAME):latest
	aws lambda update-function-code --function-name ai-tts --image-uri $(ECR_URI)/$(IMAGE_NAME):latest
	aws lambda update-function-code --function-name ai-stt-auth --image-uri $(ECR_URI)/$(IMAGE_NAME):latest
	aws lambda update-function-code --function-name ai-stt-file --image-uri $(ECR_URI)/$(IMAGE_NAME):latest
	aws lambda update-function-code --function-name ai-ghost-webhook --image-uri $(ECR_URI)/$(IMAGE_NAME):latest

deploy: login deploy-image update-lambdas

Step 2: Create Infrastructure (First Run Only)

Create Lambda Functions (AWS Console):
- Create 6 functions (ai-llm, ai-translate, etc.).
- Select Container Image -> Browse ECR -> Select ai-voice-suite.
- Crucial: For each function, go to Image Configuration > Edit > CMD Override:
  - ai-llm → src/llm.handler
  - ai-translate → src/translate.handler
  - ai-ghost-webhook → src/ghost-webhook.handler
  - (Repeat for others matching the filename).
Configure API Gateway:
- Create HTTP API.
- Create Routes (e.g., POST /ai/llm) pointing to the respective Lambda functions.

Push Initial Image:Bash

make deploy-image

Create ECR Repository:Bash

aws ecr create-repository --repository-name ai-voice-suite

This video provides an excellent walkthrough on setting up AWS Lambda with container images (Docker), which perfectly matches the deployment strategy outlined in this document:How to Deploy Docker Container to AWS Lambda.

How to develop AWS Lambda locally with SAM, Docker, Python and VSCode - Part 3Cloud With Girish · 219 回の視聴

5. AWS Configuration Checklist

Before testing, ensure these are set in AWS Console:

IAM Role: The Lambda Role must have permissions: polly:*, translate:*, transcribe:*, s3:PutObject.
Environment Variables (Lambda):
- OPENAI_API_KEY: (Your sk-...)
- AUDIO_BUCKET: (Your S3 bucket name)
- STREAMING_ROLE_ARN: (ARN for STT WebSocket Auth)
Function URLs: Enable "Auth: NONE" and "CORS: *" for easiest testing from localhost.

6. Frontend Integration (Next.js)

6.1 API Configuration

Store your Lambda Function URLs in your Next.js .env.local:

To switch from individual Lambda URLs to your custom domain 60-think.com (via AWS API Gateway), you typically create a subdomain like api.60-think.com or ai.60-think.com.

Here is the updated configuration and code.

6.2 Updated `.env.local`

Instead of 4 separate URLs, you now have one Base URL.

Bash

# .env.local

# The Base URL for your Custom Domain on API Gateway
NEXT_PUBLIC_AI_GATEWAY_URL="https://api.60-think.com"

# Optional: If you haven't set up the subdomain yet, use the raw AWS URL:
# NEXT_PUBLIC_AI_GATEWAY_URL="https://xyz123.execute-api.us-east-1.amazonaws.com"

6.3 Updated React Hook (`useAiTools.ts`)

Update your hook to append the specific routes (/ai/llm, /ai/tts, etc.) to that single Base URL.

TypeScript

import { useState } from 'react';

export function useAiTools() {
  const [loading, setLoading] = useState(false);
  
  // 1. Get the Base URL
  const BASE_URL = process.env.NEXT_PUBLIC_AI_GATEWAY_URL;

  const callApi = async (endpoint: string, payload: object, method = 'POST') => {
    setLoading(true);
    try {
      // 2. Construct full URL: https://api.60-think.com/ai/llm
      const url = `${BASE_URL}${endpoint}`;
      
      const res = await fetch(url, {
        method,
        headers: { 'Content-Type': 'application/json' },
        body: method === 'POST' ? JSON.stringify(payload) : undefined
      });
      
      if (!res.ok) throw new Error(await res.text());
      return await res.json();
    } catch (err) {
      console.error(err);
      throw err;
    } finally {
      setLoading(false);
    }
  };

  // 3. Define methods with specific routes
  const generateText = (prompt: string, task: string, context?: string) => 
    callApi('/ai/llm', { prompt, task, context });

  const translateText = (text: string, targetLang: string) => 
    callApi('/ai/translate', { text, targetLang });

  const generateSpeech = (text: string) => 
    callApi('/ai/tts', { text });

  const getMicAuth = () => 
    callApi('/auth/stt', {}, 'GET');

  return { loading, generateText, translateText, generateSpeech, getMicAuth };
}

6.4 Critical Setup: Connecting the Domain in AWS

Just writing api.60-think.com in your code won't work until you configure AWS. You must Map the domain to the API Gateway.

Request a Certificate:
- Go to AWS Certificate Manager (ACM) -> Request a certificate for api.60-think.com.
- Validate it (DNS validation via Route53 or your DNS provider).
Create Custom Domain Name:
- Go to API Gateway Console -> Custom domain names -> Create.
- Domain name: api.60-think.com.
- ACM Certificate: Select the one you just created.
Map to API:
- Click the "API mappings" tab.
- API: Select GhostAI_Gateway.
- Stage: $default.
Update DNS:
- AWS will give you a API Gateway domain name (e.g., d-xyz.execute-api.us-east-1.amazonaws.com).
- Go to your DNS provider (where 60-think.com is managed).
- Create a CNAME record: api pointing to d-xyz.execute-api....

6.5 React Hook: `useAiTools.ts`

This hook abstracts all backend communication.

TypeScript

import { useState } from 'react';

export function useAiTools() {
  const [loading, setLoading] = useState(false);

  // Helper to fetch generic JSON endpoints
  const callApi = async (url: string, payload: object) => {
    setLoading(true);
    try {
      const res = await fetch(url, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify(payload)
      });
      if (!res.ok) throw new Error(await res.text());
      return await res.json();
    } catch (err) {
      console.error(err);
      throw err;
    } finally {
      setLoading(false);
    }
  };

  const generateText = (prompt: string, task: 'expand' | 'headline', context?: string) => 
    callApi(process.env.NEXT_PUBLIC_API_LLM!, { prompt, task, context });

  const translateText = (text: string, targetLang: string) => 
    callApi(process.env.NEXT_PUBLIC_API_TRANSLATE!, { text, targetLang });

  const generateSpeech = (text: string) => 
    callApi(process.env.NEXT_PUBLIC_API_TTS!, { text });

  // For Real-time STT, you just fetch credentials
  const getMicAuth = () => callApi(process.env.NEXT_PUBLIC_API_AUTH!, {});

  return { loading, generateText, translateText, generateSpeech, getMicAuth };
}

6.6 Test Component: `AiTestConsole.tsx`

TypeScript

import { useState } from 'react';
import { useAiTools } from '../hooks/useAiTools';

export default function AiTestConsole() {
  const { loading, generateText, translateText } = useAiTools();
  const [input, setInput] = useState("");
  const [result, setResult] = useState("");

  const handleExpand = async () => {
    const res = await generateText(input, "expand");
    setResult(res.result);
  };

  const handleTranslate = async () => {
    const res = await translateText(input, "fr"); // Translate to French
    setResult(res.translated);
  };

  return (
    <div className="p-4 border rounded bg-gray-50">
      <h3 className="font-bold">AI Developer Console</h3>
      <textarea 
        className="w-full p-2 border mt-2" 
        value={input} 
        onChange={e => setInput(e.target.value)}
        placeholder="Type content here..."
      />
      
      <div className="flex gap-2 mt-2">
        <button onClick={handleExpand} disabled={loading} className="bg-blue-600 text-white px-4 py-2 rounded">
          {loading ? "Thinking..." : "Expand Text (LLM)"}
        </button>
        <button onClick={handleTranslate} disabled={loading} className="bg-green-600 text-white px-4 py-2 rounded">
          Translate to FR
        </button>
      </div>

      {result && (
        <div className="mt-4 p-2 bg-white border">
          <strong>Result:</strong>
          <p>{result}</p>
        </div>
      )}
    </div>
  );
}

2025/12/30

Think-AI : AI-Voice-Suite: （開発案）v1.0.1

1. Requirements Overview

Interactive Features (Frontend Triggered)

Automated Features (Background Triggered)

2. Project Structure

3. Source Code (Copy & Paste)

3.1 Dependencies (package.json)

3.2 Dockerfile (The Monolith)

3.3 Lambda Handlers (src/*.js)

A. LLM Agent (src/llm.js)

B. Translation (src/translate.js)

C. Text-to-Speech (src/tts.js)

D. Real-time STT Auth (src/stt-auth.js)

E. STT File Trigger (src/stt-file-api.js)

F. Ghost Webhook (src/ghost-webhook.js)

4. Deployment Steps

Step 1: Create Makefile

Step 2: Create Infrastructure (First Run Only)

5. AWS Configuration Checklist

6. Frontend Integration (Next.js)

6.1 API Configuration

6.2 Updated .env.local

6.3 Updated React Hook (useAiTools.ts)

6.4 Critical Setup: Connecting the Domain in AWS

6.5 React Hook: useAiTools.ts

6.6 Test Component: AiTestConsole.tsx

3.1 Dependencies (`package.json`)

**3.3 Lambda Handlers (`src/*.js`)**

A. LLM Agent (`src/llm.js`)

B. Translation (`src/translate.js`)

C. Text-to-Speech (`src/tts.js`)

D. Real-time STT Auth (`src/stt-auth.js`)

E. STT File Trigger (`src/stt-file-api.js`)

F. Ghost Webhook (`src/ghost-webhook.js`)

6.2 Updated `.env.local`

6.3 Updated React Hook (`useAiTools.ts`)

6.5 React Hook: `useAiTools.ts`

6.6 Test Component: `AiTestConsole.tsx`