[ PROMPT_NODE_22776 ]

whisper

[ SKILL_DOCUMENTATION ]

# Whisper - 鲁棒语音识别 OpenAI 的多语言语音识别模型。 ## 何时使用 Whisper **适用场景：** - 语音转文字转录（99 种语言） - 播客/视频转录 - 会议纪要自动化 - 翻译为英语 - 嘈杂音频转录 - 多语言音频处理 **指标**: - **72,900+ GitHub 星标** - 支持 99 种语言 - 基于 680,000 小时音频训练 - MIT 许可证 **替代方案**: - **AssemblyAI**: 托管 API，支持说话人日志记录 - **Deepgram**: 实时流式 ASR - **Google Speech-to-Text**: 云端服务 ## 快速开始 ### 安装 bash # 需要 Python 3.8-3.11 pip install -U openai-whisper # 需要 ffmpeg # macOS: brew install ffmpeg # Ubuntu: sudo apt install ffmpeg # Windows: choco install ffmpeg ### 基础转录 python import whisper # 加载模型 model = whisper.load_model("base") # 转录 result = model.transcribe("audio.mp3") # 打印文本 print(result["text"]) # 访问片段 for segment in result["segments"]: print(f"[{segment['start']:.2f}s - {segment['end']:.2f}s] {segment['text']}") ## 模型大小 python # 可用模型 models = ["tiny", "base", "small", "medium", "large", "turbo"] # 加载特定模型 model = whisper.load_model("turbo") # 最快，质量良好 | 模型 | 参数量 | 仅英语 | 多语言 | 速度 | 显存 | |-------|------------|--------------|--------------|-------|------| | tiny | 39M | ✓ | ✓ | ~32x | ~1 GB | | base | 74M | ✓ | ✓ | ~16x | ~1 GB | | small | 244M | ✓ | ✓ | ~6x | ~2 GB | | medium | 769M | ✓ | ✓ | ~2x | ~5 GB | | large | 1550M | ✗ | ✓ | 1x | ~10 GB | | turbo | 809M | ✗ | ✓ | ~8x | ~6 GB | **建议**: 使用 `turbo` 以获得最佳速度/质量平衡，使用 `base` 进行原型开发 ## 转录选项 ### 语言指定 python # 自动检测语言 result = model.transcribe("audio.mp3") # 指定语言（更快） result = model.transcribe("audio.mp3", language="en") # 支持: en, es, fr, de, it, pt, ru, ja, ko, zh 等 89 种语言 ### 任务选择 python # 转录（默认） result = model.transcribe("audio.mp3", task="transcribe") # 翻译为英语 result = model.transcribe("spanish.mp3", task="translate") # 输入: 西班牙语音频 → 输出: 英语文本 ### 初始提示词 python # 通过上下文提高准确性 result = model.transcribe( "audio.mp3", initial_prompt="This is a technical podcast about machine learning and

数据来源：claude-code-templates（MIT），中文翻译由 AI 生成。详见关于我们。

BAGUA AI