智能体工作流开发体验

效率提升：对高频复用的资源（如角色图片、背景模板）进行本地缓存，减少 API 调用次数质量优化：针对不同书籍类型（小说 / 科普 / 历史）微调提示词模板，增强风格适配性容错处理：在工作流中添加节点重试机制（如音频生成失败时重试 3 次），提升稳定性引入多语言支持：扩展语音合成插件至多语言音色，适配外文书籍增强交互性：通过 Coze 的 “用户输入节点” 在视频生成过程中插入用户提问环节自动化发布

羞儿

670人浏览 · 2025-11-20 13:16:27

羞儿 · 2025-11-20 13:16:27 发布

核心目标

构建 “书籍内容结构化→多模态资源生成→剪映集成合成” 的闭环工作流，实现 “输入书籍信息→输出绘声绘色 AI 视频” 的自动化生产，达成 “10 分钟掌握书籍核心情感 + 知识点关联” 的效果。核心逻辑：通过 “多粒度摘要分解→情感与风格锚定→结构化思维导图生成→情感驱动多模态生成” 的递进流程，结合 Coze 工作流引擎整合大模型、语音合成、视频编辑等工具，实现 80% 可视化配置 + 20% 编码辅助的高效落地。

基础准备

工具与平台配置

类别	具体工具 / 平台	核心用途	关键配置要求	注意事项
工作流引擎	Coze（扣子）	整合插件与大模型，编排工作流	完成实名认证，创建 “私人空间”	需提前开通插件调用权限（免费版满足基础需求）
视频编辑	剪映电脑版（≥v2.5.0）	视频二次编辑与导出	支持导入 JSON 草稿，登录账号	专业版可解锁关键帧等高级功能
大模型支持	豆包 1.5 Pro/DeepSeek-V3	对话生成、关键词提取、情感分析	Coze 内置集成，无需额外 API 密钥	长文本处理需确保上下文窗口≥128K
多模态工具	语音合成（扣子官方）、画板组件	生成音频、背景图等多模态资源	语音合成需预设角色音色，画板设 1920×1080px	提前添加至 Coze 个人空间，避免构建时遗漏
进阶工具（可选）	ElevenLabs（TTS）、MidJourney（视觉）	提升语音 / 视觉质感	需绑定 API 密钥或 Discord 账号	适合对视频质量有更高要求的场景

数据与资源准备，无需复杂向量库（按 JSON 格式整理书籍核心信息，存储于 Coze“资源管理→文件存储”，每本书单独配置）按以下结构整理成 JSON 文件（便于大模型调用）：

{
  "book_info": {
    "title": "富爸爸与穷爸爸",
    "author": "罗伯特·清崎和莎朗·L·莱希特",
    "core_themes": ["资产与负债", "财商思维", "被动收入"],
    "pain_points": ["混淆资产与负债", "为钱工作而非让钱为工作", "月光族理财困境"],
    "real_cases": ["自住房是否为资产", "指数基金入门投资"]
  }
}

存储路径：Coze 工作流 “资源管理”→“文件存储”（支持直接调用）,每本书单独配置，新增书籍时复制模板修改。

标准化素材包（固定复用），角色图片：主持人头像（300×300px）、书籍拟人化图片（300×300px），存储为 URL 或本地文件，背景模板：1920×1080px 纯色 / 简约风格背景（避免遮挡字幕），音效素材：开场音效（5 秒内）、字幕出场音效（1-2 秒），上传至 Coze 文件存储。

书籍类型 - 风格映射库（./config/style_mapping.json），按书籍类型预设视觉 / 听觉风格，解决 “风格适配” 问题：

{
  "小说": {
    "视觉风格": {
      "背景类型": "场景插画",
      "色调": "暖色调（文学小说）/冷色调（科幻小说）",
      "角色插画风格": "写实风（历史小说）/动漫风（青春小说）"
    },
    "听觉风格": {
      "主持人音色": "沉稳男声（历史）/清亮女声（青春）",
      "背景音乐类型": "纯音乐（钢琴+小提琴）"
    }
  },
  "科普": {
    "视觉风格": {
      "背景类型": "科技感网格",
      "色调": "蓝白冷色调",
      "动效": "知识点出现时用“缩放+渐显”"
    },
    "听觉风格": {
      "主持人音色": "清晰中性声",
      "情感音效": "知识点强调时用“叮”提示音"
    }
  }
  // 其他类型（历史等）省略
}

情感 - 资源映射库（./config/emotion_resource.csv），核心解决 “情感传递” 问题，关联 “情感标签→音色→背景音乐→视觉元素”

情感标签	TTS 情感参数（讯飞）	背景音乐 BGM（本地路径）	背景色调	字幕动效	角色表情（插画）
激昂	emotion=0,rate=110	./bgm/ 激昂_英雄的黎明.mp3	红 + 金	快速闪烁 + 放大	握拳睁眼
悲伤	emotion=4,rate=80	./bgm/ 悲伤_月光下的凤尾竹.mp3	灰 + 蓝	缓慢淡入 + 下移	低头垂泪
紧张	emotion=2,rate=120	./bgm/ 紧张_碟中谍主题曲.mp3	黑 + 深红	抖动 + 渐暗	皱眉屏息
平和	emotion=1,rate=95	./bgm/ 平和_卡农.mp3	白 + 浅绿	平稳淡入	微笑平视
疑惑	emotion=3,rate=100	./bgm/ 疑惑_神秘园.mp3	紫 + 灰	左右摇摆 + 模糊	歪头皱眉

核心资源构建：插件、提示词、知识库

插件集成与配置（Coze 可视化操作），打开 Coze→进入工作流编辑器→左侧 “插件”→“插件商店”，搜索目标插件→点击 “添加到我的智能体”→选择所在空间，关键插件配置示例：「语音合成」：默认音色设置（主持人→“沉稳男声”，书籍→“温和知性声”）；「视频合成_剪映小助手」：预设分辨率（1080P）、帧率（30 帧 / 秒）

插件	核心参数	配置要点
语音合成	text（文本）、voice_type（音色）	通过变量传递台词，按角色动态切换 voice_type
视频合成_剪映小助手	draft_url（草稿地址）、audio_infos（音频配置）、caption_infos（字幕配置）	需与数据整合节点输出的 JSON 格式严格匹配
画板组件	text（水印文字）、width/height（尺寸）	水印位置设为右下角，字体大小 24 号

提示词模板工程（核心生产力），对话文案生成模板（核心模板）

# 角色
你是擅长书籍拟人化对话的文案师，能将书籍核心内容转化为“主持人+书籍”的访谈式文案。

# 要求
1. 角色固定：主持人（提问方）、书籍名称（回答方，拟人化称呼如“富爸爸老师”）
2. 内容结构：每个核心知识点对应1个问题，包含“提问→回答→现实案例→行动建议”4部分
3. 格式规范：
   - 台词≤10字/短句，用逗号拆分长句
   - 至少生成10个问题，总字数≥1000字
   - 关键知识点（如“资产”“被动收入”）需明确提及
   - 主持人需添加惊讶语气台词（如“啊！原来我一直搞错了？”）增强互动感
4. 输出格式：
{
  "role_list": [{"role_name": "主持人"}, {"role_name": "[书籍名称]"}],
  "text_list": [
    {"order": 1, "role_name": "主持人", "line": "台词内容"},
    {"order": 2, "role_name": "[书籍名称]", "line": "台词内容"}
  ]
}

# 输入信息
书籍名称：{{book_name}}
书籍核心信息：{{book_kb}}

关键词提取模板

从以下对话文案中提取10个核心关键词，需包含书籍核心概念、痛点、解决方案：
对话文案：{{dialog_text}}
输出格式：["关键词1", "关键词2", ...]

角色音频区分模板

根据以下台词列表，按角色分配音色：
- 主持人：沉稳男声，语速正常
- [书籍名称]：温和知性声，语速稍慢
台词列表：{{text_list}}
输出格式：[{"order": 1, "role_name": "主持人", "text": "台词", "voice_type": "沉稳男声"}, ...]

知识库调用配置（轻量化 RAG）,无需单独部署向量库，采用 Coze “文件检索” 功能：将整理好的书籍 JSON 文件（book_info.json）上传至 Coze “资源管理”→“文件存储”；在工作流 “大模型节点” 中启用 “文件检索”→选择目标书籍文件；配置检索参数：top_k=3（返回 3 条核心信息），确保大模型生成文案时引用书籍真实内容，避免幻觉。

工作流构建

通过与大语言模型，沟通交互而来，主要是智能体工作流开发思路分析和学习记录，对于设计需求，拆解需求，联动分析需求目的和技术实现的过程探讨。暂时对后文的可行性没有完备性实现和讨论分析。
节点 1：输入参数配置（工作流入口）
- 接收用户输入，标准化任务参数，Coze “输入节点”，设置 3 个参数：book_name（必填，文本类型，提示 “输入书籍名称”）；watermark（可选，文本类型，默认 “我的书籍解读”）；video_style（可选，下拉选择，选项 “科普风”“访谈风”“趣味风”）
节点 2：书籍知识库检索（RAG 增强）
- 获取书籍核心信息（作者、主题、案例），Coze “大模型节点”→启用 “文件检索”→选择对应书籍 JSON 文件。提示词：“提取《{{book_name}}》的作者、核心主题、3 个用户痛点、2 个现实案例，输出 JSON 格式”
节点 3：对话文案生成（核心内容生产）
- 生成 “主持人 + 书籍” 的访谈式文案，Coze “大模型节点”→选择 “豆包 1.5 Pro-128K”→调用 “对话文案生成模板”。输入参数：book_name（节点 1 输出）、book_kb（节点 2 输出）
节点 4：关键词提取（字幕高亮准备）
- 提取文案核心关键词，用于字幕特殊标记，Coze “大模型节点”→选择 “DeepSeek-V3”→调用 “关键词提取模板”，输入参数：dialog_text（节点 3 输出的 text_list 字段）
节点 5：角色音频生成与时长计算（多线程并行）
- 区分角色生成音频，并获取每段音频时长（用于音画同步）。配置：先通过 “大模型节点” 调用 “角色音频区分模板”，输出带音色的台词列表；添加 “循环节点”→遍历台词列表，调用 “语音合成插件”：输入：text = 台词内容，voice_type = 角色对应音色，输出：每段音频的 URL；调用 “视频合成_剪映小助手” 的 “get_audio_duration” 功能，获取每段音频时长（秒）
节点 6：背景与角色图片生成（多模态资源）
- 生成视频背景图和角色头像，配置：背景图：Coze “画板组件”→输入 watermark 参数→生成 1920×1080px 背景（右下角水印）；角色图片：直接引用提前准备的主持人 / 书籍拟人化图片 URL（或通过 Stable Diffusion API 生成）。

节点 7：数据整合与时间轴生成（核心编码节点）

整合音频、字幕、图片信息，生成剪映可识别的时间轴数据，Coze “代码节点”→编写 JavaScript 代码（核心逻辑如下），核心功能：合并音频列表与时长列表，计算每段音频的开始 / 结束时间；为字幕添加关键词高亮配置（关键词标红，字体放大）；生成角色图片的显示时间轴（与对应角色台词同步）。

async function main({ params }) {
  const { audio_list, duration_list, text_list, keywords, bg_image_url } = params;
  let audioStartTime = 0;
  const audioData = [];
  const captions = [];

  // 处理音频时间轴
  for (let i = 0; i < audio_list.length; i++) {
    const duration = duration_list[i];
    audioData.push({
      audio_url: audio_list[i],
      start: audioStartTime,
      end: audioStartTime + duration,
      volume: 2
    });

    // 处理字幕（关键词高亮）
    const text = text_list[i];
    let captionText = text.line;
    const matchedKeys = keywords.filter(k => captionText.includes(k));
    if (matchedKeys.length > 0) {
      captionText = captionText.replace(new RegExp(`(${matchedKeys.join('|')})`, 'g'), '<span style="color:#fe8a80;font-size:10px;">$1</span>');
    }

    captions.push({
      start: audioStartTime,
      end: audioStartTime + duration,
      text: captionText,
      in_animation: "羽化向右擦开"
    });

    audioStartTime += duration;
  }

  // 处理背景图片
  const bgImageData = [{
    image_url: bg_image_url,
    start: 0,
    end: audioStartTime + 1000,
    width: 1920,
    height: 1080
  }];

  return { audioData: JSON.stringify(audioData), captions: JSON.stringify(captions), bgImageData: JSON.stringify(bgImageData) };
}

节点 8：剪映草稿生成与导出（工作流收尾）
- 生成剪映可导入的 JSON 草稿，完成视频合成，配置：调用 “视频合成_剪映小助手” 的系列功能（按顺序执行）：
- create_draft：创建剪映草稿（参数：width=1920，height=1080）
- add_images：添加背景图（参数：draft_url = 创建草稿返回值，image_infos = 节点 7 输出的 bgImageData）
- add_audios：添加音频（参数：draft_url，audio_infos = 节点 7 输出的 audioData）
- add_captions：添加字幕（参数：draft_url，caption_infos = 节点 7 输出的 captions）
- save_draft：保存草稿（返回 draft_url）
输入参数 → 知识库检索 → 对话文案生成 → 关键词提取 → 角色音频生成 → 背景/角色图生成 → 数据整合（编码） → 剪映草稿导出 → 剪映客户端编辑导出。
Agent 和 LLM 在功能和应用场景上有明显的互补关系。Agent 智能体作为一个综合性的概念，涵盖了从感知到决策再到行动的全过程，而LLM则专注于自然语言的理解和生成。通过将两者结合起来，可以创建更加智能、高效和人性化的系统，应用于各种复杂的任务和场景中。

书籍 AI 视频工作流

针对 “输入书籍名 / PDF→输出绘声绘色 AI 视频” 的核心需求，围绕 “情感传递 + 知识结构化” 双目标。通过 “多粒度摘要分解→情感与风格锚定→结构化思维导图生成→情感驱动多模态生成” 的递进逻辑，强化模型对书籍内容的理解深度，最终输出 “风格适配、情感饱满、知识清晰” 的视频，实现 “10 分钟掌握书籍核心情感 + 知识点关联” 的目标。

工具类型	基础免费方案	进阶付费方案（提升质感）	核心用途	配置要求
书籍解析工具	PyPDF2（PDF 解析）+ 豆瓣 API（元数据）	Adobe Acrobat Pro（PDF 精准提取）	解析 PDF 文本 / 提取书籍元数据（作者 / 类型 / 评分）	免费版需处理 PDF 扫描件 OCR：搭配百度 OCR API（每日免费 100 次）
大模型平台	豆包 1.5 Pro（Coze 内置）	豆包 Ultra + GPT-4o（情感分析专用）	摘要分解 / 情感分析 / 文案生成	绑定 API 密钥，设置上下文窗口≥128K（适配长文本 PDF）
语音合成（TTS）	讯飞 TTS 免费版（情感合成）	ElevenLabs（角色音色定制）	生成带情感的角色语音（主持人 / 书籍拟人化）	免费版需配置 “情感参数”（如 “兴奋度 = 0.8”“语速 = 0.9”）
视觉生成工具	Stable Diffusion WebUI（本地部署）	MidJourney（风格化插画）+ Canva 专业版	生成风格化背景 / 角色插画 / 知识点可视化图	本地部署需配置显存≥8G；MidJourney 需加入 Discord 频道，绑定支付方式
视频合成工具	MoviePy（Python 库）+ 剪映免费版	Adobe Premiere Pro + AE（动效）	音画同步 / 字幕动效 / 知识点高亮	MoviePy 需安装 ffmpeg（配置环境变量）；剪映需升级至 “专业版”（解锁关键帧）
知识库工具	Chroma（本地向量库）	Pinecone（云端向量库）	存储书籍分块文本 / 思维导图节点 / 情感标签	本地部署 Chroma 需 Python≥3.8，初始化时设置`persist_directory="./book_kb"`
工作流引擎	Python + Airflow（本地调度）	Coze 专业版（可视化编排 + 定时触发）	串联 12 个节点，管理状态与重试机制	Airflow 需配置 DAG 文件，设置`retries=3`（节点失败重试）

精细化资源库构建，书籍类型 - 风格映射库（手动配置，可复用），核心解决 “风格适配” 问题，按书籍类型预设视觉 / 听觉风格，存储为 JSON 文件（路径：./config/style_mapping.json）：

{
  "小说": {
    "视觉风格": {
      "背景类型": "场景插画",
      "色调": "暖色调（文学小说）/冷色调（科幻小说）",
      "角色插画风格": "写实风（历史小说）/动漫风（青春小说）",
      "知识点可视化": "人物关系图（用虚线连接次要关系）+ 情节时间轴"
    },
    "听觉风格": {
      "主持人音色": "沉稳男声（历史）/清亮女声（青春）",
      "背景音乐类型": "纯音乐（钢琴+小提琴）",
      "情感音效": "场景匹配（如战争小说加炮火背景音）"
    },
    "文案风格": "故事化叙事（含角色对话引用），情感词占比≥30%"
  },
  "科普": {
    "视觉风格": {
      "背景类型": "科技感网格",
      "色调": "蓝白冷色调",
      "知识点可视化": "流程图（因果关系）+ 数据图表（对比类知识）",
      "动效": "知识点出现时用“缩放+渐显”"
    },
    "听觉风格": {
      "主持人音色": "清晰中性声",
      "背景音乐类型": "轻电子音效（节奏≤100BPM）",
      "情感音效": "知识点强调时用“叮”提示音"
    },
    "文案风格": "逻辑化拆解（分点但不直白说“第一”），专业术语解释占比≥20%"
  },
  "历史": {
    "视觉风格": {
      "背景类型": "古卷纹理",
      "色调": "复古黄+褐色",
      "知识点可视化": "时间轴（标重点事件）+ 人物关系树",
      "动效": "画面切换用“翻页”效果"
    },
    "听觉风格": {
      "主持人音色": "浑厚男声",
      "背景音乐类型": "古筝+鼓点（慢节奏）",
      "情感音效": "朝代更替时用“钟鸣”音效"
    },
    "文案风格": "时空叙事（含“公元XX年”时间锚点），历史背景补充占比≥25%"
  }
}

预处理资源包（批量准备，提升效率），通用元素：主持人固定形象（3 个角度：正面讲解 / 侧面指向屏幕 / 低头翻书）、转场动画（5 种基础款：淡入淡出 / 推拉 / 旋转 / 翻页 / 缩放）、字幕模板（按书籍类型分 3 套：小说用圆角气泡 / 科普用方角框 / 历史用古卷边）。工具脚本：preprocess_resource.py（批量处理图片尺寸为 1920×1080、音频格式为 MP3。

核心知识强化

多粒度摘要分解（避免漏内容，分 3 层递进），分解逻辑（按 “书籍→章节→段落” 拆解，每层绑定情感标签）

粒度	核心目标	输出格式（JSON）	分析工具	人工确认点
书籍级	抓核心主题 + 整体情感基调	`{"title":"三体","core_theme":["黑暗森林法则","人性与文明冲突"],"emotion_tone":"紧张+悲凉","author_view":"对技术失控的警惕"}`	大模型（豆包 Ultra）+ 豆瓣书评分析	主题是否完整、情感基调是否准确
章节级	抓核心事件 + 情感转折	`{"chapter_id":"c01","title":"红岸基地","core_event":"叶文洁触发信号","emotion_change":["压抑→震惊→绝望"],"key_character":["叶文洁"]}`	大模型 + TextRank 关键词提取	核心事件是否遗漏、情感转折是否合理
段落级	抓关键句 + 局部情感	`{"paragraph_id":"p03","text":"叶文洁按下按钮的瞬间...","key_sentence":"人类文明的命运在此刻转向","local_emotion":"绝望","relevance":0.92}`	大模型 + SnowNLP 情感分析	关键句是否精准、局部情感是否匹配

实现代码（./core/multi_granularity_summary.py）

import json
import jieba
from snownlp import SnowNLP
from textrank4zh import TextRank4Keyword
from langchain.chat_models import ChatDoubao

# 初始化工具
llm = ChatDoubao(model="doubao-pro-128k", api_key="key")
tr4k = TextRank4Keyword()

def parse_book_level_summary(book_text: str, book_name: str) -> dict:
    """书籍级摘要：核心主题+情感基调"""
    prompt = f"""
    分析书籍《{book_name}》的以下核心信息：
    1. 3个以内核心主题（用短语，避免宽泛）
    2. 1个整体情感基调（如“紧张+悲凉”，限2个词）
    3. 作者对核心主题的核心观点（1句话）
    输入文本：{book_text[:10000]}（截取前10000字，保证覆盖核心）
    输出严格JSON，无多余内容。
    """
    response = llm.predict(prompt)
    book_summary = json.loads(response)
    # 补充豆瓣书评情感验证（可选，提升准确性）
    # douban_reviews = get_douban_reviews(book_name)  # 需实现豆瓣API调用
    # book_summary["emotion_tone"] = verify_emotion_with_reviews(book_summary["emotion_tone"], douban_reviews)
    return book_summary

def parse_chapter_level_summary(chapter_text: str, chapter_id: str, chapter_title: str) -> dict:
    """章节级摘要：核心事件+情感转折"""
    # 提取核心事件（TextRank关键词辅助）
    tr4k.analyze(text=chapter_text, lower=True, window=2)
    keywords = [item.word for item in tr4k.get_keywords(5, word_min_len=2)]
    
    prompt = f"""
    分析章节《{chapter_title}》的以下信息：
    1. 1个核心事件（含人物+动作+结果）
    2. 情感转折（按顺序，限3个词，如“压抑→震惊→绝望”）
    3. 关键人物（限2个以内）
    输入文本：{chapter_text}
    关键词参考：{keywords}
    输出严格JSON，无多余内容。
    """
    response = llm.predict(prompt)
    chapter_summary = json.loads(response)
    chapter_summary["chapter_id"] = chapter_id
    chapter_summary["title"] = chapter_title
    return chapter_summary

def parse_paragraph_level_summary(paragraph_text: str, paragraph_id: str, chapter_id: str) -> dict:
    """段落级摘要：关键句+局部情感"""
    # 关键句提取（TextRank）
    tr4k.analyze(text=paragraph_text, lower=True, window=1)
    key_sentences = [item.sentence for item in tr4k.get_key_sentences(num=1)]
    
    # 局部情感分析（SnowNLP，0-1分，0=极悲，1=极喜）
    s = SnowNLP(paragraph_text)
    emotion_score = s.sentiments
    # 映射情感标签
    if emotion_score < 0.3:
        local_emotion = "悲伤"
    elif 0.3 <= emotion_score < 0.5:
        local_emotion = "压抑"
    elif 0.5 <= emotion_score < 0.7:
        local_emotion = "平和"
    elif 0.7 <= emotion_score < 0.9:
        local_emotion = "愉悦"
    else:
        local_emotion = "激昂"
    
    return {
        "paragraph_id": paragraph_id,
        "chapter_id": chapter_id,
        "text": paragraph_text,
        "key_sentence": key_sentences[0] if key_sentences else paragraph_text[:50],
        "local_emotion": local_emotion,
        "emotion_score": round(emotion_score, 2),
        "relevance": round(min(1.0, len(key_sentences[0])/len(paragraph_text)*1.2), 2)  # 相关性评分
    }

# 批量处理示例（调用入口）
def batch_multi_granularity_summary(book_pdf_path: str = None, book_name: str = None) -> dict:
    """
    批量生成3级摘要：
    输入：二选一（PDF路径/书籍名，PDF优先，书籍名需调用API获取文本）
    输出：包含书籍/章节/段落级摘要的完整字典
    """
    # 1. 解析书籍文本（PDF或API获取）
    if book_pdf_path:
        book_text, chapters = parse_pdf_to_text(book_pdf_path)  # 需实现PDF解析函数
    else:
        book_text, chapters = get_book_text_from_api(book_name)  # 需实现书籍API调用
    
    # 2. 生成3级摘要
    book_summary = parse_book_level_summary(book_text, book_name or chapters[0]["title"])
    chapter_summaries = []
    paragraph_summaries = []
    
    for chapter in chapters:
        chapter_summary = parse_chapter_level_summary(chapter["text"], chapter["chapter_id"], chapter["title"])
        chapter_summaries.append(chapter_summary)
        
        # 拆分段落（按换行符）
        paragraphs = [p.strip() for p in chapter["text"].split("\n") if p.strip()]
        for i, para in enumerate(paragraphs):
            para_id = f"p{chapter['chapter_id'][1:]}_{i+1}"  # 格式：p01_03（第1章第3段）
            para_summary = parse_paragraph_level_summary(para, para_id, chapter["chapter_id"])
            paragraph_summaries.append(para_summary)
    
    # 3. 关联3级摘要（添加父ID）
    book_summary["chapter_ids"] = [c["chapter_id"] for c in chapter_summaries]
    for c in chapter_summaries:
        c["paragraph_ids"] = [p["paragraph_id"] for p in paragraph_summaries if p["chapter_id"] == c["chapter_id"]]
    
    return {
        "book_level": book_summary,
        "chapter_level": chapter_summaries,
        "paragraph_level": paragraph_summaries,
        "create_time": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    }

结构化思维导图生成（理清关系，关联情感），思维导图核心结构（分 “知识层” 和 “情感层” 双维度）

维度	层级结构	关联内容	可视化规则
知识层	根节点→核心主题→子主题→知识点	书籍级主题→章节级事件→段落级关键句	根节点（书籍名，红色）→核心主题（蓝色）→子主题（绿色）→知识点（黑色）
情感层	核心基调→章节情感转折→段落局部情感	书籍级基调→章节转折→段落情感标签	用 “情感图标” 标注（激昂 =🔥/ 悲伤 =💧/ 紧张 =⚠️）
关系层	人物关系 / 知识点关联	关键人物→互动关系；知识点→因果关系	人物用 “头像图标”，因果关系用 “箭头”（正向 =→/ 反向 =←）

实现代码（./core/mindmap_generator.py，关联摘要结果）

import json
import svgwrite
from graphviz import Digraph
from typing import List, Dict

def generate_structured_mindmap(summary_data: dict, output_format: str = "svg+json") -> dict:
    """
    生成结构化思维导图：
    输入：多粒度摘要数据（batch_multi_granularity_summary的输出）
    输出：JSON结构+可视化文件（SVG/PNG）
    """
    book_summary = summary_data["book_level"]
    chapter_summaries = summary_data["chapter_level"]
    paragraph_summaries = summary_data["paragraph_level"]
    
    # 1. 构建思维导图JSON结构（知识+情感+关系）
    mindmap_json = {
        "root": {
            "id": "root",
            "label": book_summary["title"],
            "type": "book",
            "emotion_tone": book_summary["emotion_tone"],
            "children": []  # 核心主题节点
        }
    }
    
    # 1.1 构建知识层（核心主题→子主题→知识点）
    core_themes = book_summary["core_theme"]
    for theme_idx, theme in enumerate(core_themes):
        theme_node = {
            "id": f"theme_{theme_idx+1}",
            "label": theme,
            "type": "core_theme",
            "children": []  # 子主题（章节级事件）
        }
        mindmap_json["root"]["children"].append(theme_node)
        
        # 关联章节级子主题（核心事件）
        for chapter in chapter_summaries:
            # 匹配主题与章节（大模型判断关联性）
            relevance = judge_relevance(theme, chapter["core_event"])  # 需实现关联性判断函数
            if relevance > 0.6:
                chapter_node = {
                    "id": chapter["chapter_id"],
                    "label": chapter["core_event"],
                    "type": "chapter_event",
                    "emotion_change": chapter["emotion_change"],
                    "children": []  # 知识点（段落级关键句）
                }
                theme_node["children"].append(chapter_node)
                
                # 关联段落级知识点（关键句）
                for para in paragraph_summaries:
                    if para["chapter_id"] == chapter["chapter_id"] and para["relevance"] > 0.8:
                        para_node = {
                            "id": para["paragraph_id"],
                            "label": para["key_sentence"],
                            "type": "paragraph_key",
                            "local_emotion": para["local_emotion"],
                            "emotion_score": para["emotion_score"]
                        }
                        chapter_node["children"].append(para_node)
    
    # 1.2 构建关系层（人物关系/知识点关联）
    # 提取关键人物
    key_characters = list(set([c for chapter in chapter_summaries for c in chapter.get("key_character", [])]))
    # 生成人物关系（大模型分析）
    character_relations = analyze_character_relations(key_characters, summary_data)  # 需实现人物关系分析
    # 生成知识点关联（大模型分析因果）
    knowledge_relations = analyze_knowledge_relations(mindmap_json, summary_data)  # 需实现知识点关联分析
    
    mindmap_json["relations"] = {
        "character": character_relations,
        "knowledge": knowledge_relations
    }
    
    # 2. 生成可视化文件（SVG+Graphviz PNG）
    outputs = {"json": mindmap_json}
    
    if "svg" in output_format:
        svg_path = f"./output/mindmap_{book_summary['title'].replace(' ', '_')}.svg"
        generate_mindmap_svg(mindmap_json, svg_path)
        outputs["svg_path"] = svg_path
    
    if "png" in output_format:
        png_path = f"./output/mindmap_{book_summary['title'].replace(' ', '_')}.png"
        generate_mindmap_png(mindmap_json, png_path)
        outputs["png_path"] = png_path
    
    return outputs

def generate_mindmap_svg(mindmap_json: dict, output_path: str):
    """生成可交互SVG思维导图（支持点击展开/折叠）"""
    dwg = svgwrite.Drawing(output_path, profile='full', size=('1200px', '800px'))
    # 根节点位置（中心）
    root_x, root_y = 600, 100
    # 绘制根节点
    dwg.add(dwg.circle(center=(root_x, root_y), r=30, fill="#E74C3C", stroke="#000", stroke_width=2))
    dwg.add(dwg.text(mindmap_json["root"]["label"], insert=(root_x-50, root_y+5), font_size=14, fill="white", text_anchor="middle"))
    
    # 递归绘制子节点（核心主题→章节→知识点）
    def draw_child_nodes(parent_node, parent_x, parent_y, level: int, direction: str):
        """
        level：层级（1=核心主题，2=章节，3=知识点）
        direction：方向（left/right，避免重叠）
        """
        child_count = len(parent_node.get("children", []))
        if child_count == 0:
            return
        # 层级间距
        level_spacing = 200 if level == 1 else 150 if level == 2 else 100
        # 子节点间距
        child_spacing = 120 if level == 1 else 80 if level == 2 else 50
        # 起始位置
        start_y = parent_y - (child_count - 1) * child_spacing / 2
        
        for i, child in enumerate(parent_node["children"]):
            child_y = start_y + i * child_spacing
            child_x = parent_x - level_spacing if direction == "left" else parent_x + level_spacing
            
            # 节点颜色（按类型）
            color_map = {"core_theme": "#3498DB", "chapter_event": "#2ECC71", "paragraph_key": "#95A5A6"}
            fill_color = color_map.get(child["type"], "#BDC3C7")
            
            # 绘制连接线
            dwg.add(dwg.line(start=(parent_x, parent_y), end=(child_x, child_y), stroke="#000", stroke_width=1.5))
            # 绘制节点
            dwg.add(dwg.circle(center=(child_x, child_y), r=20 if level < 3 else 15, fill=fill_color, stroke="#000", stroke_width=1))
            # 节点文本（截断长文本）
            label = child["label"][:15] + "..." if len(child["label"]) > 15 else child["label"]
            dwg.add(dwg.text(label, insert=(child_x-30, child_y+3), font_size=12 if level < 3 else 10, fill="white", text_anchor="middle"))
            
            # 绘制情感图标（如果有）
            if "emotion_tone" in child or "local_emotion" in child:
                emotion = child.get("emotion_tone") or child.get("local_emotion")
                emoji = {"激昂": "🔥", "悲伤": "💧", "紧张": "⚠️", "平和": "😌", "疑惑": "❓"}.get(emotion, "❓")
                dwg.add(dwg.text(emoji, insert=(child_x+15, child_y-15), font_size=12))
            
            # 递归绘制下一级（交替方向）
            next_direction = "right" if direction == "left" else "left"
            if level < 3:  # 只画3级，避免过于复杂
                draw_child_nodes(child, child_x, child_y, level + 1, next_direction)
    
    # 绘制核心主题（左右分栏）
    core_themes = mindmap_json["root"]["children"]
    left_themes = core_themes[:len(core_themes)//2]
    right_themes = core_themes[len(core_themes)//2:]
    
    # 左侧核心主题
    for theme in left_themes:
        draw_child_nodes(theme, root_x, root_y, level=1, direction="left")
    # 右侧核心主题
    for theme in right_themes:
        draw_child_nodes(theme, root_x, root_y, level=1, direction="right")
    
    # 保存SVG
    dwg.save()

def generate_mindmap_png(mindmap_json: dict, output_path: str):
    """生成Graphviz PNG（适合打印，结构清晰）"""
    dot = Digraph(comment=mindmap_json["root"]["label"], format="png")
    dot.attr(size='10,8', rankdir='TB', bgcolor='white')
    
    # 根节点
    dot.node("root", mindmap_json["root"]["label"], shape="ellipse", style="filled", color="#E74C3C", fontcolor="white")
    
    # 递归添加节点和边
    def add_nodes_edges(parent_id: str, parent_node: dict):
        for child in parent_node.get("children", []):
            # 节点样式（按类型）
            style_map = {
                "core_theme": "filled,rounded",
                "chapter_event": "filled,rounded",
                "paragraph_key": "rounded"
            }
            color_map = {
                "core_theme": "#3498DB",
                "chapter_event": "#2ECC71",
                "paragraph_key": "#95A5A6"
            }
            # 节点标签（含情感）
            label = child["label"]
            if "local_emotion" in child:
                label += f"\n({child['local_emotion']})"
            dot.node(child["id"], label, style=style_map.get(child["type"], "rounded"), color=color_map.get(child["type"], "#BDC3C7"), fontcolor="white" if child["type"] != "paragraph_key" else "black")
            # 边
            dot.edge(parent_id, child["id"], style="solid")
            # 递归
            add_nodes_edges(child["id"], child)
    
    # 添加知识层节点
    add_nodes_edges("root", mindmap_json["root"])
    
    # 添加关系层（人物关系用虚线）
    for rel in mindmap_json["relations"]["character"]:
        dot.edge(rel["from"], rel["to"], label=rel["relation"], style="dashed", color="#F39C12")
    
    # 保存PNG
    dot.render(output_path.replace(".png", ""), view=False)  # 自动添加.png后缀

工作流总览（输入→输出全链路）

节点 1：书籍解析与文本提取（基础输入处理）
- 将书籍名 / PDF 转化为结构化文本（分章节），解决 “输入格式不统一” 问题。输入：书籍名（字符串）/PDF 路径（字符串，优先），输出：结构化文本字典（{"book_name":"三体","total_text":"完整文本","chapters":[{"chapter_id":"c01","title":"红岸基地","text":"章节文本","page_range":"1-25"}]}）
- 实现方式：编码（Python）PDF 解析：用PyPDF2读取文本，扫描件用pytesseract+PIL做 OCR（需安装 Tesseract 中文包）；书籍名解析：调用豆瓣 API+Google Books API 获取文本，优先选带章节划分的版本；代码核心逻辑：./core/book_parser.py（含 PDF 解析、API 调用、章节拆分）
节点 2：多粒度摘要分解（知识强化核心）
- 生成 “书籍→章节→段落”3 级摘要，绑定情感标签，避免漏内容。输入：节点 1 的结构化文本字典，输出：3 级摘要字典。实现方式：编码（Python）+ 大模型调用；核心代码：./core/multi_granularity_summary.py。
节点 3：书籍类型与风格匹配（风格适配核心）
- 根据书籍文本 / 元数据判断类型，匹配预设风格库（视觉 + 听觉 + 文案），输入：节点 1 的书籍名 / 元数据、节点 2 的书籍级摘要（核心主题），输出：风格匹配结果字典（{"book_type":"科幻小说","visual_style":{"背景类型":"场景插画","色调":"冷色调"},"audio_style":{"主持人音色":"沉稳男声"},"copy_style":"故事化叙事"}）。实现方式：编码（Python）+ 规则匹配：书籍类型判断：用关键词匹配（如 “宇宙”“星球”→科幻；“公元”“朝代”→历史）+ 大模型辅助校正；风格匹配：读取./config/style_mapping.json，按类型提取风格参数；核心代码：./core/style_matcher.py。
节点 4：结构化思维导图生成（知识可视化核心）
- 生成 “知识 + 情感 + 关系” 三维思维导图（JSON+SVG+PNG），输入：节点 2 的 3 级摘要字典，输出：思维导图结果字典（含 JSON 结构、SVG 路径、PNG 路径）。实现方式：编码（Python）+ 可视化库，核心代码：./core/mindmap_generator.py

节点 5：情感与内容锚定（情感传递核心）

建立 “摘要段落→情感标签→多模态资源” 的映射关系，确保情感一致性，输入：节点 2 的段落级摘要（含局部情感）、节点 3 的风格结果、节点 4 的思维导图（情感层）；输出：情感 - 内容映射表（JSON）

{
  "mapping_list": [
    {
      "paragraph_id": "p01_03",
      "key_sentence": "叶文洁按下了发射按钮",
      "local_emotion": "绝望",
      "tts_params": {"emotion":4,"rate":80,"voice":"沉稳女声"},
      "bgm_params": {"path":"./bgm/悲伤_月光下的凤尾竹.mp3","volume":0.3},
      "visual_params": {"bg_color":"灰+蓝","illustration":"叶文洁低头按按钮（悲伤表情）","motion":"缓慢淡入"},
      "mindmap_node_id": "p01_03"
    }
  ],
  "emotion_statistics": {"绝望":3,"紧张":5,"平和":2}  // 情感分布，用于整体节奏把控
}

实现方式：编码（Python）+ 规则匹配,读取./config/emotion_resource.csv，按段落情感标签匹配资源参数；结合风格结果微调（如科幻小说的 “悲伤” 背景色调整为 “深蓝 + 灰”），代码：./core/emotion_content_anchor.py

节点 6：情感驱动文案生成（内容呈现核心）

生成 “主持人 + 拟人化书籍” 的访谈式文案，绑定情感节奏与知识点，输入：节点 5 的情感 - 内容映射表、节点 4 的思维导图（知识层）、节点 3 的风格结果，输出：结构化文案字典（含角色、台词、情感标签、关联知识点）

{
  "role_list": [
    {"role_id":"host","name":"主持人","style":"科幻风：理性中带共情","voice_type":"沉稳男声"},
    {"role_id":"book","name":"《三体》","style":"拟人化：经历者口吻","voice_type":"温和女声（带沧桑感）"}
  ],
  "script_list": [
    {
      "script_id":"s01",
      "order":1,
      "role_id":"host",
      "line":"大家好，今天我们邀请到《三体》，聊聊红岸基地的故事。",
      "emotion":"平和",
      "related_mindmap_node":"root",
      "duration":5  // 预估朗读时长（秒）
    },
    {
      "script_id":"s02",
      "order":2,
      "role_id":"book",
      "line":"1967年的大兴安岭，我见证了改变文明的一刻。",
      "emotion":"压抑",
      "related_mindmap_node":"c01",
      "duration":4
    },
    {
      "script_id":"s03",
      "order":3,
      "role_id":"host",
      "line":"是叶文洁按下发射按钮的瞬间吗？那刻您是什么感受？",
      "emotion":"好奇",
      "related_mindmap_node":"p01_03",
      "duration":6
    },
    {
      "script_id":"s04",
      "order":4,
      "role_id":"book",
      "line":"那不是勇气，是绝望后的破罐破摔。",
      "emotion":"绝望",
      "related_mindmap_node":"p01_03",
      "duration":5
    }
  ],
  "knowledge_check": {"覆盖核心主题数":2,"遗漏知识点":[]}  // 知识点覆盖校验
}

实现方式：大模型调用（豆包 Ultra）+ 提示词工程，核心提示词模板（./config/prompt/script_generator.txt）：

角色：
1. 主持人：按【{style.copy_style}】风格提问，引导书籍讲出核心知识点，每3句加1个情感互动（如“那刻一定很绝望吧？”）
2. 书籍拟人化：以“经历者”口吻回答，引用【{key_sentence}】，带【{local_emotion}】情感，补充1句背景细节

要求：
1. 结构：按“引入→知识点1→情感互动→知识点2→总结”流转，每知识点对应1组问答
2. 情感节奏：参考【{emotion_statistics}】，避免连续3句同一情感，高潮部分（如紧张）集中在中间1/3
3. 知识点：必须覆盖思维导图【{core_theme}】，关联节点ID标注在related_mindmap_node
4. 格式：严格JSON，无多余内容，line字段每句≤15字，用口语化表达

输入：
情感-内容映射表：{mapping_list}
核心主题：{core_theme}
风格：{style}

代码逻辑：./core/script_generator.py（加载模板→填充参数→调用大模型→JSON 解析→知识点覆盖校验）

节点 7：多角色 TTS 生成（情感听觉传递）
- 按角色 + 情感标签生成带情感的语音，确保 “音色 + 情感” 双匹配。输入：节点 6 的结构化文案字典、节点 5 的情感 - 内容映射表；输出：多角色音频字典（含分段音频 + 完整音频）
- ```
{
  "audio_list": [
    {"script_id":"s01","role_id":"host","audio_path":"./audio/s01_host.mp3","duration":4.8,"emotion":"平和"},
    {"script_id":"s02","role_id":"book","audio_path":"./audio/s02_book.mp3","duration":3.9,"emotion":"压抑"}
  ],
  "total_audio_path":"./audio/total_audio.mp3",
  "emotion_check": {"匹配度":0.92,"不匹配项":[{"script_id":"s05","reason":"情感为“激昂”但语速不足"}]}
}
```
- 实现方式：编码（Python）+ TTS API 调用（讯飞 / ElevenLabs）,遍历文案列表→按角色 + 情感标签取 TTS 参数（如讯飞emotion=4代表悲伤）→ 调用 API 生成音频→ 用pydub合并为完整音频→ 校验音频时长与预估时长偏差（≤0.5 秒）,代码：./core/tts_generator.py。同一角色用固定音色 ID，确保一致性；情感高潮部分提升音量 10%

节点 8：风格化视觉素材生成（风格视觉传递）

生成 “背景 + 角色插画 + 知识点可视化图”，匹配书籍风格与情感，输入：节点 3 的风格结果、节点 4 的思维导图（可视化文件）、节点 5 的情感 - 内容映射表，输出：视觉素材字典（含路径 + 关联文案 ID）

{
  "background_list": [
    {"scene_id":"bg01","emotion":"平和","path":"./visual/bg01_peaceful.png","style":"科幻风：深蓝网格背景"},
    {"scene_id":"bg02","emotion":"绝望","path":"./visual/bg02_desperate.png","style":"科幻风：灰蓝废墟背景"}
  ],
  "illustration_list": [
    {"script_id":"s02","role":"《三体》拟人化","path":"./visual/ill01_book.png","style":"动漫风：女性轮廓+星空背景","emotion":"压抑"},
    {"script_id":"s04","role":"叶文洁","path":"./visual/ill02_ye.png","style":"写实风：低头按按钮","emotion":"绝望"}
  ],
  "knowledge_visual_list": [
    {"mindmap_node_id":"theme_1","path":"./visual/knowledge01_dark_forest.png","type":"流程图","style":"科幻风：黑色背景+白色线条"}
  ]
}

实现方式：AI 绘画 API（Stable Diffusion/MidJourney）+ 编码。提示词工程：按 “风格 + 场景 + 情感 + 细节” 结构生成提示词（如科幻绝望背景：“科幻风格，废墟红岸基地，灰蓝色调，下雨，昏暗灯光，细节丰富，8k”），代码：./core/visual_generator.py

节点 9：字幕生成与情感动效绑定（情感可视化传递）

生成带情感动效的字幕，关联音频时长，强化情感传递，输入：节点 6 的结构化文案字典、节点 7 的音频列表、节点 5 的情感 - 内容映射表。输出：字幕文件（SRT+JSON 动效配置），SRT 文件（./subtitle/script.srt）：标准字幕格式，含时间轴；动效配置（./subtitle/effect_config.json）：

{
  "subtitle_list": [
    {
      "script_id":"s01",
      "start_time":"00:00:00,000",
      "end_time":"00:00:04,800",
      "text":"大家好，今天我们邀请到《三体》。",
      "emotion":"平和",
      "effect": {"type":"淡入","duration":0.5,"color":"#FFFFFF","font_size":24},
      "position":"bottom-center"
    },
    {
      "script_id":"s04",
      "start_time":"00:00:14,700",
      "end_time":"00:00:19,700",
      "text":"那不是勇气，是绝望后的破罐破摔。",
      "emotion":"绝望",
      "effect": {"type":"缓慢下移+淡暗","duration":0.8,"color":"#FF6B6B","font_size":26,"bold":true},
      "position":"bottom-center"
    }
  ]
}

实现方式：编码（Python）：遍历文案→ 按音频时长计算字幕时间轴（开始时间 = 前一段结束时间，结束时间 = 开始时间 + 音频时长）→ 按情感标签匹配动效（读取./config/emotion_resource.csv的 “字幕动效” 字段）→ 生成 SRT 和 JSON 配置。代码：./core/subtitle_generator.py

节点 10：音画同步与视频粗剪（核心合成环节）

实现 “音频（情感 TTS）+ 视觉（风格化素材）+ 字幕（情感动效）” 的精准同步，生成可编辑的粗剪视频与工程文件。输入：音频列表：节点 7 输出的audio_list（分段音频路径 + 时长）；视觉素材：节点 8 输出的visual_dict（背景 / 插画 / 知识点图路径 + 关联脚本 ID）；字幕配置：节点 9 输出的subtitle_config（SRT 文件 + 动效 JSON）；文案字典：节点 6 输出的script_dict（脚本顺序 + 情感标签）；风格配置：节点 3 输出的style_dict（转场 / 音量 / 分辨率参数）。输出：粗剪视频：./output/rough_cut/[书籍名]_rough.mp4（1080P，30 帧）；工程文件：./output/project/[书籍名]_project.xml（支持剪映 / Pr 导入）；同步校验报告：./output/log/[书籍名]_sync_report.json（音画同步偏差记录）。代码：./core/video_rough_cut.py

import os
import json
import xml.etree.ElementTree as ET
from xml.dom import minidom
from moviepy.editor import (
    VideoClip, AudioFileClip, ImageClip, CompositeVideoClip,
    TextClip, concatenate_videoclips, TransitionClip, ColorClip
)
from moviepy.video.fx.all import resize, fadein, fadeout
from typing import Dict, List

# 全局配置（与风格配置联动）
GLOBAL_CONFIG = {
    "resolution": (1920, 1080),  # 固定1080P
    "fps": 30,
    "transition_duration": 0.5,  # 转场时长（秒）
    "bgm_volume_ratio": 0.2,     # 背景音乐音量占比
    "font_path": "./resources/fonts/SimHei.ttf",  # 中文字体路径（需提前放置）
    "temp_dir": "./temp/rough_cut",  # 临时文件目录
}
os.makedirs(GLOBAL_CONFIG["temp_dir"], exist_ok=True)

def load_resources(audio_list: List[Dict], visual_dict: Dict, subtitle_config: Dict) -> Dict:
    """加载并预处理所有音视频资源（统一格式+缓存）"""
    resources = {"audio": {}, "visual": {}, "subtitle": {}}
    
    # 1. 加载音频（统一MP3格式，缓存时长）
    for audio in audio_list:
        script_id = audio["script_id"]
        clip = AudioFileClip(audio["audio_path"]).set_fps(44100)
        resources["audio"][script_id] = {
            "clip": clip,
            "duration": clip.duration,
            "start_time": 0.0,  # 后续计算时间轴
            "end_time": 0.0
        }
    
    # 2. 加载视觉素材（统一分辨率，预处理动效）
    # 背景素材（按情感分类缓存）
    resources["visual"]["background"] = {}
    for bg in visual_dict["background_list"]:
        emotion = bg["emotion"]
        clip = ImageClip(bg["path"]).resize(GLOBAL_CONFIG["resolution"]).set_fps(GLOBAL_CONFIG["fps"])
        resources["visual"]["background"][emotion] = clip
    
    # 插画素材（按脚本ID关联）
    resources["visual"]["illustration"] = {}
    for ill in visual_dict["illustration_list"]:
        script_id = ill.get("script_id")
        if script_id:
            clip = ImageClip(ill["path"]).resize((400, 400)).set_fps(GLOBAL_CONFIG["fps"])
            # 预处理入场动效（淡入+轻微缩放）
            clip = clip.fadein(0.3).resize(lambda t: 1 + 0.05 * t if t < 1 else 1.05)
            resources["visual"]["illustration"][script_id] = clip
    
    # 知识点图（按思维导图节点ID关联）
    resources["visual"]["knowledge"] = {}
    for kv in visual_dict["knowledge_visual_list"]:
        node_id = kv["mindmap_node_id"]
        clip = ImageClip(kv["path"]).resize((800, 600)).set_fps(GLOBAL_CONFIG["fps"])
        resources["visual"]["knowledge"][node_id] = clip
    
    # 3. 加载字幕（关联脚本ID，缓存动效参数）
    for sub in subtitle_config["subtitle_list"]:
        script_id = sub["script_id"]
        resources["subtitle"][script_id] = sub
    
    return resources

def calculate_timeline(script_list: List[Dict], audio_resources: Dict) -> List[Dict]:
    """计算每段脚本的时间轴（确保音频+字幕+视觉同步）"""
    timeline = []
    current_time = 0.0  # 起始时间（秒）
    
    for script in script_list:
        script_id = script["script_id"]
        emotion = script["emotion"]
        related_node = script["related_mindmap_node"]
        
        # 匹配音频并更新时间轴
        audio = audio_resources.get(script_id)
        if not audio:
            raise ValueError(f"脚本{script_id}未找到对应音频")
        
        audio["start_time"] = current_time
        audio["end_time"] = current_time + audio["duration"]
        
        # 组装时间轴信息
        timeline.append({
            "script_id": script_id,
            "emotion": emotion,
            "related_node": related_node,
            "audio_start": audio["start_time"],
            "audio_end": audio["end_time"],
            "duration": audio["duration"]
        })
        
        # 更新当前时间（叠加转场时长，最后一段不叠加）
        if script != script_list[-1]:
            current_time = audio["end_time"] + GLOBAL_CONFIG["transition_duration"]
        else:
            current_time = audio["end_time"]
    
    return timeline

def generate_script_clip(
    timeline_item: Dict,
    resources: Dict,
    style_dict: Dict
) -> CompositeVideoClip:
    """生成单段脚本的视频片段（背景+插画+字幕+音频）"""
    script_id = timeline_item["script_id"]
    emotion = timeline_item["emotion"]
    duration = timeline_item["duration"]
    related_node = timeline_item["related_node"]
    
    # 1. 背景层（按情感匹配）
    bg_clip = resources["visual"]["background"].get(emotion)
    if not bg_clip:
        bg_clip = resources["visual"]["background"]["平和"]  # 默认平和背景
    bg_clip = bg_clip.set_duration(duration)
    
    # 2. 插画层（按脚本ID匹配，无则跳过）
    ill_clip = resources["visual"]["illustration"].get(script_id)
    visual_clips = [bg_clip]
    if ill_clip:
        # 位置：右侧中间，避免遮挡字幕
        ill_clip = ill_clip.set_duration(duration).set_position(("right", "center"))
        visual_clips.append(ill_clip)
    
    # 3. 知识点图层（按思维导图节点匹配，核心主题才显示）
    if related_node.startswith("theme_"):
        kv_clip = resources["visual"]["knowledge"].get(related_node)
        if kv_clip:
            kv_clip = kv_clip.set_duration(duration).set_position(("left", "center"))
            visual_clips.append(kv_clip)
    
    # 4. 字幕层（带情感动效）
    sub = resources["subtitle"].get(script_id)
    if sub:
        # 基础字幕配置
        text_clip = TextClip(
            sub["text"],
            fontsize=sub["effect"]["font_size"],
            color=sub["effect"]["color"],
            font=GLOBAL_CONFIG["font_path"],
            size=(1600, 60),
            method="label"  # 抗锯齿
        ).set_duration(duration).set_position(sub["position"])
        
        # 绑定情感动效
        effect_type = sub["effect"]["type"]
        if effect_type == "淡入":
            text_clip = text_clip.fadein(sub["effect"]["duration"])
        elif effect_type == "缓慢下移+淡暗":
            text_clip = text_clip.fadein(0.3).set_position(
                lambda t: (960, 800 + 20 * t) if t < 2 else (960, 840)
            ).fadeout(sub["effect"]["duration"])
        elif effect_type == "快速闪烁+放大":
            text_clip = text_clip.resize(lambda t: 1 + 0.1 * (t % 1)).fadein(0.2)
        
        visual_clips.append(text_clip)
    
    # 5. 合成视觉片段
    visual_clip = CompositeVideoClip(visual_clips, size=GLOBAL_CONFIG["resolution"])
    
    # 6. 叠加音频（角色音频+背景音乐）
    audio_clip = resources["audio"][script_id]["clip"]
    # 匹配风格化背景音乐（如科幻用电子乐，历史用古乐）
    bgm_path = style_dict["audio_style"].get("bgm_path", "./resources/bgm/default.mp3")
    bgm_clip = AudioFileClip(bgm_path).set_duration(duration).volumex(GLOBAL_CONFIG["bgm_volume_ratio"])
    # 混合音频（角色音频为主，背景音乐为辅）
    final_audio = audio_clip.volumex(1.0).set_duration(duration)
    final_audio = final_audio.set_audio(final_audio.audio.overlay(bgm_clip.audio))
    
    # 7. 绑定音视频
    clip = visual_clip.set_audio(final_audio)
    return clip

def generate_project_xml(timeline: List[Dict], resources: Dict, book_name: str) -> str:
    """生成剪映/Pr可导入的XML工程文件（便于人工微调）"""
    root = ET.Element("xmeml", version="5")
    project = ET.SubElement(root, "project")
    ET.SubElement(project, "name").text = f"{book_name}_AI视频工程"
    
    # 序列配置（匹配1080P/30帧）
    sequence = ET.SubElement(project, "sequence")
    ET.SubElement(sequence, "name").text = "主序列"
    settings = ET.SubElement(sequence, "settings")
    ET.SubElement(settings, "width").text = str(GLOBAL_CONFIG["resolution"][0])
    ET.SubElement(settings, "height").text = str(GLOBAL_CONFIG["resolution"][1])
    ET.SubElement(settings, "frameRate").text = str(GLOBAL_CONFIG["fps"])
    
    # 轨道配置（视频3轨+音频2轨）
    tracks = ET.SubElement(sequence, "tracks")
    # 视频轨1（背景）、轨2（插画）、轨3（字幕+知识点图）
    for i in range(3):
        video_track = ET.SubElement(tracks, "track", type="video")
        ET.SubElement(video_track, "name").text = f"视频轨{i+1}"
    # 音频轨1（角色音频）、轨2（背景音乐）
    for i in range(2):
        audio_track = ET.SubElement(tracks, "track", type="audio")
        ET.SubElement(audio_track, "name").text = f"音频轨{i+1}"
    
    # 按时间轴添加素材到轨道
    current_time = 0.0
    for item in timeline:
        script_id = item["script_id"]
        duration = item["duration"]
        start_time = current_time
        
        # 1. 视频轨1：背景
        bg_clip = resources["visual"]["background"][item["emotion"]]
        bg_path = bg_clip.filename
        video1_clip = ET.SubElement(tracks[0], "clip")
        ET.SubElement(video1_clip, "name").text = f"背景_{script_id}"
        ET.SubElement(video1_clip, "start").text = str(start_time)
        ET.SubElement(video1_clip, "duration").text = str(duration)
        ET.SubElement(video1_clip, "path").text = bg_path
        
        # 2. 视频轨2：插画（如有）
        if script_id in resources["visual"]["illustration"]:
            ill_path = resources["visual"]["illustration"][script_id].filename
            video2_clip = ET.SubElement(tracks[1], "clip")
            ET.SubElement(video2_clip, "name").text = f"插画_{script_id}"
            ET.SubElement(video2_clip, "start").text = str(start_time)
            ET.SubElement(video2_clip, "duration").text = str(duration)
            ET.SubElement(video2_clip, "path").text = ill_path
        
        # 3. 音频轨1：角色音频
        audio_path = resources["audio"][script_id]["clip"].filename
        audio1_clip = ET.SubElement(tracks[3], "clip")
        ET.SubElement(audio1_clip, "name").text = f"音频_{script_id}"
        ET.SubElement(audio1_clip, "start").text = str(start_time)
        ET.SubElement(audio1_clip, "duration").text = str(duration)
        ET.SubElement(audio1_clip, "path").text = audio_path
        
        # 更新当前时间（叠加转场）
        if item != timeline[-1]:
            current_time = start_time + duration + GLOBAL_CONFIG["transition_duration"]
        else:
            current_time = start_time + duration
    
    # 美化XML格式并保存
    xml_str = minidom.parseString(ET.tostring(root)).toprettyxml(indent="  ")
    xml_path = f"./output/project/{book_name}_project.xml"
    os.makedirs(os.path.dirname(xml_path), exist_ok=True)
    with open(xml_path, "w", encoding="utf-8") as f:
        f.write(xml_str)
    
    return xml_path

def video_rough_cut(
    audio_list: List[Dict],
    visual_dict: Dict,
    subtitle_config: Dict,
    script_dict: Dict,
    style_dict: Dict,
    book_name: str
) -> Dict:
    """
    视频粗剪主函数
    返回：粗剪视频路径、工程文件路径、同步校验报告
    """
    try:
        # 1. 初始化输出目录
        output_dir = f"./output/rough_cut"
        os.makedirs(output_dir, exist_ok=True)
        log_dir = f"./output/log"
        os.makedirs(log_dir, exist_ok=True)
        
        # 2. 加载并预处理资源
        print(f"[节点10] 加载资源...")
        resources = load_resources(audio_list, visual_dict, subtitle_config)
        
        # 3. 计算时间轴
        print(f"[节点10] 计算时间轴...")
        timeline = calculate_timeline(script_dict["script_list"], resources["audio"])
        
        # 4. 生成单段脚本片段
        print(f"[节点10] 生成脚本片段...")
        script_clips = []
        for item in timeline:
            clip = generate_script_clip(item, resources, style_dict)
            script_clips.append(clip)
        
        # 5. 添加转场效果（按风格匹配）
        print(f"[节点10] 添加转场效果...")
        transition_type = style_dict["visual_style"].get("transition", "fade")  # 风格化转场
        final_clips = []
        for i in range(len(script_clips)):
            final_clips.append(script_clips[i])
            # 非最后一段添加转场
            if i != len(script_clips) - 1:
                if transition_type == "fade":
                    trans_clip = TransitionClip(
                        [script_clips[i], script_clips[i+1]],
                        duration=GLOBAL_CONFIG["transition_duration"],
                        transition="fade"
                    )
                    final_clips.append(trans_clip)
                elif transition_type == "slide":
                    trans_clip = TransitionClip(
                        [script_clips[i], script_clips[i+1]],
                        duration=GLOBAL_CONFIG["transition_duration"],
                        transition="slide_right"
                    )
                    final_clips.append(trans_clip)
        
        # 6. 拼接所有片段
        final_clip = concatenate_videoclips(final_clips, method="compose")
        
        # 7. 导出粗剪视频
        rough_video_path = f"{output_dir}/{book_name}_rough.mp4"
        final_clip.write_videofile(
            rough_video_path,
            fps=GLOBAL_CONFIG["fps"],
            codec="libx264",  # H.264编码，兼容性强
            audio_codec="aac",
            bitrate="8000k",  # 保证画质
            threads=4  # 多线程加速
        )
        
        # 8. 生成工程文件
        project_xml_path = generate_project_xml(timeline, resources, book_name)
        
        # 9. 生成同步校验报告（检查音画偏差）
        sync_report = {
            "book_name": book_name,
            "total_clips": len(script_clips),
            "sync_errors": [],
            "total_duration": final_clip.duration
        }
        for item in timeline:
            script_id = item["script_id"]
            audio_duration = resources["audio"][script_id]["duration"]
            visual_duration = item["duration"]
            deviation = abs(audio_duration - visual_duration)
            if deviation > 0.5:  # 偏差>0.5秒记录为错误
                sync_report["sync_errors"].append({
                    "script_id": script_id,
                    "audio_duration": audio_duration,
                    "visual_duration": visual_duration,
                    "deviation": deviation,
                    "suggestion": "调整视觉素材时长或重新生成音频"
                })
        
        # 保存校验报告
        sync_report_path = f"{log_dir}/{book_name}_sync_report.json"
        with open(sync_report_path, "w", encoding="utf-8") as f:
            json.dump(sync_report, f, ensure_ascii=False, indent=2)
        
        print(f"[节点10] 粗剪完成：{rough_video_path}")
        return {
            "rough_video_path": rough_video_path,
            "project_xml_path": project_xml_path,
            "sync_report_path": sync_report_path,
            "status": "success"
        }
    
    except Exception as e:
        error_msg = f"[节点10] 粗剪失败：{str(e)}"
        print(error_msg)
        # 保存错误日志
        with open(f"{log_dir}/{book_name}_cut_error.log", "w", encoding="utf-8") as f:
            f.write(error_msg)
        return {
            "status": "failed",
            "error_msg": error_msg,
            "fallback_path": f"{GLOBAL_CONFIG['temp_dir']}/{book_name}_fallback.mp4"  # 降级输出
        }

节点 11：人工审核与微调（质量把控核心）

通过可视化工具检查 “情感匹配度、音画同步、知识完整性”，进行精细化调整，避免自动化流程的疏漏。输入：粗剪视频：节点 10 输出的rough_video_path；工程文件：节点 10 输出的project_xml_path；同步校验报告：节点 10 输出的sync_report_path；所有中间资源：前 9 个节点的输出文件（便于回溯修改）。输出：微调后工程文件：./output/project/[书籍名]_project_finetuned.xml；审核报告：./output/log/[书籍名]_audit_report.json（含修改记录）；确认信号：./output/flag/[书籍名]_audit_pass.flag（标记通过审核）。自动化辅助脚本（./utils/audit_helper.py）：

import json
import os
from typing import List

def generate_audit_checklist(sync_report_path: str, mindmap_json_path: str) -> str:
    """生成审核清单（自动标记高风险项）"""
    # 加载同步报告
    with open(sync_report_path, "r", encoding="utf-8") as f:
        sync_report = json.load(f)
    # 加载思维导图（检查知识点覆盖）
    with open(mindmap_json_path, "r", encoding="utf-8") as f:
        mindmap = json.load(f)
    core_themes = [t["label"] for t in mindmap["root"]["children"]]
    
    # 生成清单
    checklist = {
        "high_risk_items": [],  # 高风险项（优先审核）
        "normal_items": []      # 常规项
    }
    
    # 高风险项：同步偏差>0.5秒
    if sync_report["sync_errors"]:
        for err in sync_report["sync_errors"]:
            checklist["high_risk_items"].append({
                "type": "sync_deviation",
                "desc": f"脚本{err['script_id']}音画偏差{err['deviation']:.2f}秒",
                "action": err["suggestion"]
            })
    
    # 高风险项：核心主题未覆盖（需人工确认）
    checklist["high_risk_items"].append({
        "type": "knowledge_coverage",
        "desc": f"需确认是否覆盖所有核心主题：{core_themes}",
        "action": "未覆盖则新增对应脚本片段"
    })
    
    # 常规项：情感/风格/字幕检查
    checklist["normal_items"].extend([
        {
            "type": "emotion_match",
            "desc": "检查TTS音色、字幕动效、背景是否与情感标签一致",
            "action": "不一致则替换对应资源"
        },
        {
            "type": "style_consistency",
            "desc": "检查所有素材风格是否与书籍类型匹配",
            "action": "批量调整风格参数"
        },
        {
            "type": "subtitle_readability",
            "desc": "检查字幕字体、大小、位置、动效是否影响阅读",
            "action": "在剪映中微调字幕属性"
        }
    ])
    
    # 保存清单
    book_name = sync_report["book_name"]
    checklist_path = f"./output/log/{book_name}_audit_checklist.json"
    with open(checklist_path, "w", encoding="utf-8") as f:
        json.dump(checklist, f, ensure_ascii=False, indent=2)
    
    return checklist_path

def record_audit_result(book_name: str, changes: List[Dict]) -> str:
    """记录审核修改结果（生成审核报告）"""
    audit_report = {
        "book_name": book_name,
        "audit_time": os.path.getmtime(f"./output/flag/{book_name}_audit_pass.flag"),
        "changes": changes,
        "final_status": "passed"
    }
    
    report_path = f"./output/log/{book_name}_audit_report.json"
    with open(report_path, "w", encoding="utf-8") as f:
        json.dump(audit_report, f, ensure_ascii=False, indent=2)
    
    return report_path

# 示例：生成审核清单
if __name__ == "__main__":
    sync_report_path = "./output/log/三体_sync_report.json"
    mindmap_json_path = "./output/mindmap_三体.json"
    generate_audit_checklist(sync_report_path, mindmap_json_path)

节点 12：视频渲染与输出（最终交付环节）

将微调后的工程文件渲染为多平台适配的最终视频，支持批量导出与格式优化。输入：微调后工程文件：节点 11 输出的project_finetuned.xml；审核报告：节点 11 输出的audit_report_path；风格配置：节点 3 输出的style_dict（平台适配参数）。输出：主视频（高清）：./output/final/[书籍名]_1080P.mp4（B 站 / YouTube 适配）；短视频（适配抖音 / 小红书）：./output/final/[书籍名]_720P_short.mp4（9:16 竖屏）；压缩版（快速分享）：./output/final/[书籍名]_compressed.mp4（体积≤100MB）；交付清单：./output/final/[书籍名]_delivery.json（所有输出文件路径）。代码（./core/video_render.py）：

import os
import json
import subprocess
from typing import Dict, List

# 平台适配配置（可扩展）
PLATFORM_CONFIG = {
    "bilibili": {
        "resolution": (1920, 1080),
        "aspect_ratio": "16:9",
        "bitrate": "10000k",
        "format": "mp4",
        "codec": "libx265"  # 高效编码，体积小画质好
    },
    "douyin": {
        "resolution": (1080, 1920),
        "aspect_ratio": "9:16",
        "bitrate": "6000k",
        "format": "mp4",
        "codec": "libx264",
        "watermark": "./resources/watermark/douyin_wm.png"  # 平台水印
    },
    "compressed": {
        "resolution": (1280, 720),
        "aspect_ratio": "16:9",
        "bitrate": "3000k",
        "format": "mp4",
        "codec": "libx264"
    }
}

def render_from_xml(project_xml_path: str, output_dir: str, book_name: str, platform: str) -> str:
    """
    从XML工程文件渲染视频（支持多平台适配）
    依赖：剪映专业版（需安装并配置环境变量）或FFmpeg
    """
    if platform not in PLATFORM_CONFIG:
        raise ValueError(f"不支持的平台：{platform}，可选：{list(PLATFORM_CONFIG.keys())}")
    
    config = PLATFORM_CONFIG[platform]
    output_filename = f"{book_name}_{platform}.mp4"
    output_path = os.path.join(output_dir, output_filename)
    
    try:
        # 方案1：使用剪映API渲染（推荐，保留工程所有效果）
        print(f"[节点12] 用剪映渲染{platform}平台视频...")
        # 剪映API调用（需提前开启剪映开发者模式，获取API密钥）
        cmd = [
            "jianying-cli",  # 剪映命令行工具
            "render",
            "--project", project_xml_path,
            "--output", output_path,
            "--width", str(config["resolution"][0]),
            "--height", str(config["resolution"][1]),
            "--bitrate", config["bitrate"],
            "--codec", config["codec"],
            "--fps", "30"
        ]
        # 添加水印（如抖音平台）
        if "watermark" in config:
            cmd.extend(["--watermark", config["watermark"], "--watermark-pos", "bottom-right"])
        
        # 执行命令
        result = subprocess.run(cmd, check=True, capture_output=True, text=True)
        print(f"[节点12] 剪映渲染完成：{output_path}")
        return output_path
    
    except Exception as e:
        # 方案2：降级使用FFmpeg渲染（兼容性更强，无剪映依赖）
        print(f"[节点12] 剪映渲染失败，使用FFmpeg降级渲染：{str(e)}")
        # 先将XML工程转换为FFmpeg可识别的文件列表（需提前生成）
        file_list_path = generate_ffmpeg_filelist(project_xml_path, platform)
        cmd = [
            "ffmpeg",
            "-f", "concat",
            "-safe", "0",
            "-i", file_list_path,
            "-s", f"{config['resolution'][0]}x{config['resolution'][1]}",
            "-b:v", config["bitrate"],
            "-c:v", config["codec"],
            "-c:a", "aac",
            "-fps_mode", "cfr",
            "-r", "30",
            "-y",  # 覆盖已有文件
            output_path
        ]
        result = subprocess.run(cmd, check=True, capture_output=True, text=True)
        print(f"[节点12] FFmpeg渲染完成：{output_path}")
        return output_path

def generate_ffmpeg_filelist(project_xml_path: str, platform: str) -> str:
    """生成FFmpeg concat需要的文件列表（适配平台分辨率）"""
    # 解析XML工程文件，提取音视频素材路径和时间轴
    tree = ET.parse(project_xml_path)
    root = tree.getroot()
    config = PLATFORM_CONFIG[platform]
    
    filelist = []
    # 提取视频轨1（背景）和音频轨1（角色音频）
    video_tracks = root.findall(".//track[@type='video']")
    audio_tracks = root.findall(".//track[@type='audio']")
    
    # 假设视频轨1和音频轨1是主素材
    video_clips = video_tracks[0].findall("clip")
    audio_clips = audio_tracks[0].findall("clip")
    
    # 按时间顺序生成文件列表
    for i in range(len(video_clips)):
        video_clip = video_clips[i]
        audio_clip = audio_clips[i] if i < len(audio_clips) else None
        
        # 视频素材
        video_path = video_clip.find("path").text
        start = float(video_clip.find("start").text)
        duration = float(video_clip.find("duration").text)
        # 调整分辨率
        filelist.append(f"file '{video_path}'")
        filelist.append(f"inpoint {start}")
        filelist.append(f"outpoint {start + duration}")
        
        # 音频素材（与视频同步）
        if audio_clip:
            audio_path = audio_clip.find("path").text
            filelist.append(f"file '{audio_path}'")
            filelist.append(f"inpoint {start}")
            filelist.append(f"outpoint {start + duration}")
    
    # 保存文件列表
    filelist_path = f"./temp/ffmpeg_filelist_{platform}.txt"
    with open(filelist_path, "w", encoding="utf-8") as f:
        f.write("\n".join(filelist))
    
    return filelist_path

def batch_render(
    project_finetuned_path: str,
    audit_report_path: str,
    style_dict: Dict,
    book_name: str
) -> Dict:
    """批量渲染多平台视频，生成交付清单"""
    # 1. 初始化输出目录
    final_dir = f"./output/final"
    os.makedirs(final_dir, exist_ok=True)
    
    # 2. 读取审核报告（确认通过审核）
    with open(audit_report_path, "r", encoding="utf-8") as f:
        audit_report = json.load(f)
    if audit_report["final_status"] != "passed":
        raise ValueError("审核未通过，无法渲染最终视频")
    
    # 3. 批量渲染（默认渲染3个平台，可扩展）
    platforms = ["bilibili", "douyin", "compressed"]
    rendered_paths = {}
    for platform in platforms:
        rendered_path = render_from_xml(project_finetuned_path, final_dir, book_name, platform)
        rendered_paths[platform] = rendered_path
    
    # 4. 生成交付清单
    delivery清单 = {
        "book_name": book_name,
        "audit_time": audit_report["audit_time"],
        "render_time": os.path.getmtime(rendered_paths["bilibili"]),
        "files": rendered_paths,
        "platform_info": {
            "bilibili": "1080P横屏，适合长视频平台",
            "douyin": "720P竖屏，适合短视频平台",
            "compressed": "720P压缩版，适合快速分享"
        },
        "md5_checksum": {
            platform: subprocess.run(
                ["md5sum", path], capture_output=True, text=True
            ).stdout.split()[0] for platform, path in rendered_paths.items()
        }
    }
    
    # 保存交付清单
    delivery_path = f"{final_dir}/{book_name}_delivery.json"
    with open(delivery_path, "w", encoding="utf-8") as f:
        json.dump(delivery清单, f, ensure_ascii=False, indent=2)
    
    print(f"[节点12] 所有视频渲染完成，交付清单：{delivery_path}")
    return {
        "delivery_path": delivery_path,
        "rendered_files": rendered_paths,
        "status": "success"
    }

# 示例：单独渲染B站平台视频
if __name__ == "__main__":
    project_path = "./output/project/三体_project_finetuned.xml"
    output_dir = "./output/final"
    book_name = "三体"
    render_from_xml(project_path, output_dir, book_name, "bilibili")

工程目录结构

book_ai_video/
├── core/                  # 核心节点实现（12节点完整代码）
│   ├── book_parser.py     # 节点1：书籍解析与文本提取
│   ├── multi_granularity_summary.py  # 节点2：多粒度摘要分解
│   ├── style_matcher.py   # 节点3：书籍类型与风格匹配
│   ├── mindmap_generator.py  # 节点4：结构化思维导图生成
│   ├── emotion_content_anchor.py  # 节点5：情感与内容锚定
│   ├── script_generator.py  # 节点6：情感驱动文案生成
│   ├── tts_generator.py   # 节点7：多角色TTS生成
│   ├── visual_generator.py  # 节点8：风格化视觉素材生成
│   ├── subtitle_generator.py  # 节点9：字幕生成与情感动效绑定
│   ├── video_rough_cut.py  # 节点10：音画同步与视频粗剪
│   ├── audit_helper.py    # 节点11：人工审核辅助脚本
│   └── video_render.py    # 节点12：视频渲染与输出
├── config/                # 配置文件
│   ├── style_mapping.json  # 书籍类型-风格映射库
│   ├── emotion_resource.csv  # 情感-多模态资源映射库
│   ├── prompt/            # 提示词模板
│   │   └── script_generator.txt  # 文案生成提示词
│   └── audit_standards.json  # 审核标准
├── resources/             # 静态资源
│   ├── fonts/             # 中文字体（需自行放置SimHei.ttf）
│   ├── bgm/               # 背景音乐
│   │   └── default.mp3    # 默认背景音乐
│   └── watermark/         # 平台水印
│       └── douyin_wm.png  # 抖音水印
├── temp/                  # 临时文件目录（自动生成）
├── output/                # 输出目录（自动生成）
│   ├── rough_cut/         # 粗剪视频
│   ├── project/           # 工程文件
│   ├── final/             # 最终视频
│   └── log/               # 日志文件
├── utils/                 # 通用工具函数
│   ├── api_client.py      # API调用（豆瓣/Google Books/TTS）
│   ├── error_handler.py   # 全局错误处理
│   └── file_utils.py      # 文件读写工具
├── main.py                # 主运行脚本
└── requirements.txt       # 依赖清单

多节点覆盖 “输入→知识强化→情感驱动→多模态生成→审核→输出” 全链路，无功能遗漏。融合 “多粒度摘要 + 结构化思维导图 + 情感动效绑定”，兼顾知识传递与情感共鸣。后续可以将轻量化 JSON 知识库替换为 Chroma/FAISS 向量库，支持长文本 PDF 的精准检索，加入 AI 数字人驱动（如用 D-ID 生成主持人视频），提升视频感染力

北京朝阳AI社区

更多推荐

理解世界还是预测未来？世界模型的综述

北京朝阳AI社区

Spring AI MCP Client Boot Starter 技术详解与最佳实践

Spring AI MCP（Model Context Protocol）Client Boot Starter 是 Spring Boot 生态下的自动化配置组件，旨在简化 MCP 客户端的集成与管理，实现与多种 AI/LLM 服务和工具的高效连接。支持同步（SYNC）与异步（ASYNC）客户端，涵盖多种传输协议（STDIO、HTTP/SSE、Streamable HTTP），并具备工具过滤、命

北京朝阳AI社区

Spring AI MCP Server Boot Starter 技术详解与最佳实践

Spring AI MCP Server Boot Starter 是连接 AI 服务与现代微服务架构的桥梁。它以注解驱动、协议灵活、自动化配置为核心，帮助开发者快速搭建高质量 AI 服务。无论是本地部署还是云原生场景，都能轻松适配，极大提升了 AI 应用的开发效率和可维护性。通过本文及配套 mermaid 图，你已系统掌握 MCP Server 的原理与实战路径，知其然更知其所以然！