Gemini 3.1 Flash TTS
Google今天发布了Gemini 3.1 Flash TTS,这是一种可以使用提示来引导的新型文本转语音模型。
微信 ezpoda免费咨询:AI编程 | AI模型微调| AI私有化部署
AI模型价格对比 | AI工具导航 | ONNX模型库 | Tripo 3D | Meshy AI | ElevenLabs | KlingAI | ArtSpace | Phot.AI | InVideo
Google今天发布了Gemini 3.1 Flash TTS,这是一种可以使用提示来引导的新型文本转语音模型。
它通过标准Gemini API使用gemini-3.1-flash-tts-preview作为模型ID来提供,但只能输出音频文件。
提示指南非常有趣。这是他们生成简短音频的示例提示:
# AUDIO PROFILE: Jaz R.
## "The Morning Hype"
## THE SCENE: The London Studio
It is 10:00 PM in a glass-walled studio overlooking the moonlit London skyline, but inside, it is blindingly bright. The red "ON AIR" tally light is blazing. Jaz is standing up, not sitting, bouncing on the balls of their heels to the rhythm of a thumping backing track. Their hands fly across the faders on a massive mixing desk. It is a chaotic, caffeine-fueled cockpit designed to wake up an entire nation.
### DIRECTOR'S NOTES
Style:
* The "Vocal Smile": You must hear the grin in the audio. The soft palate is always raised to keep the tone bright, sunny, and explicitly inviting.
* Dynamics: High projection without shouting. Punchy consonants and elongated vowels on excitement words (e.g., "Beauuutiful morning").
Pace: Speaks at an energetic pace, keeping up with the fast music. Speaks with a "bouncing" cadence. High-speed delivery with fluid transitions — no dead air, no gaps.
Accent: Jaz is from Brixton, London
### SAMPLE CONTEXT
Jaz is the industry standard for Top 40 radio, high-octane event promos, or any script that requires a charismatic Estuary accent and 11/10 infectious energy.
#### TRANSCRIPT
[excitedly] Yes, massive vibes in the studio! You are locked in and it is absolutely popping off in London right now. If you're stuck on the tube, or just sat there pretending to work... stop it. Seriously, I see you.
[shouting] Turn this up! We've got the project roadmap landing in three, two... let's go!
使用那个示例提示我得到的结果:试听。
然后我将其修改为"Jaz is from Newcastle"和"... requires a charismatic Newcastle accent",得到了这个结果:试听。
这是Exeter, Devon的效果:试听。
我制作了一个Gemini 3.1 Pro风格的UI来尝试它。

原文链接:Gemini 3.1 Flash TTS
汇智网翻译整理,转载时标明出处