构建 AI 驱动的播客生成器

AI 世界发展太快了，我希望能获取最新 AI 新闻的简短音频摘要，这样我就可以在洗澡时、吃早餐时或锻炼时收听。通过结合 Gemini 2.5 Pro 模型、Cloud Text-to-Speech 和 Cloud Run，并借助 Gemini CLI 的帮助，我成功构建了一个系统，能在睡觉时将 RSS 订阅源转换为类似播客的音频简报 :-)

在本文中，我将向您介绍如何创建一个使用 AI 进行内容摘要和自然语言处理的播客生成器。我们将探讨 Gemini 如何创建对话式摘要，Text-to-Speech 如何生成广播级音频质量，以及 Cloud Run 如何处理整个自动化流水线。

您可以在 GitHub 上找到此项目的完整代码：https://github.com/ggalloro/ai-news，包括 Terraform 代码和部署到您自己的 Google Cloud 项目的说明。

1、架构：AI 驱动的无服务器流水线

该应用程序围绕智能内容处理和自动化音频生成构建。以下是各组件如何协同工作的：

AI 新闻播客生成器架构

核心 AI 流水线：

Cloud Scheduler： 每日自动触发内容生成
Cloud Run Job： 具有 AI 集成的无服务器 Python 处理
Gemini 2.5 Pro API： 用于智能、对话式摘要的高级 AI
Text-to-Speech API： 高质量神经语音合成
Cloud Storage： 生成的音频文件的可靠存储
Cloud Run Service： 用于访问节目的轻量级 Web 界面
Secret Manager： API 密钥的安全存储

AI 处理流程：

RSS 聚合： 从精选的 AI 新闻源获取最新文章
智能过滤： 从多个来源选择均衡的内容
AI 摘要： Gemini 将技术文章转换为对话式摘要
音频生成： Text-to-Speech 创建自然、播客品质的音频
音频拼接： 通过编程方式将开头和结尾拼接在一起，制作精美的成品

2、后端工作原理

后端是一个设计为作为 Cloud Run Job 运行的 Python 脚本。它执行一系列任务来创建每日播客。

第 1 步：获取和均衡新闻

第一步是收集素材。脚本从预定义的 RSS 订阅源列表中获取文章。为了确保播客只包含新内容，它首先检查 Cloud Storage 存储桶中的 last_processed_entries.json 文件。此文件存储每个订阅源最后处理文章的时间戳。只有在那个时间戳之后发布的文章才会被考虑。

这是代码中当前的订阅源列表，您可以将其更改为任何您想要的内容，甚至非 AI 相关的内容 :-)

RSS_FEEDS = [
    "https://deepmind.google/blog/rss.xml",
    "https://raw.githubusercontent.com/Olshansk/rss-feeds/main/feeds/feed_anthropic_news.xml",
    "https://openai.com/blog/rss.xml",
    "https://simonwillison.net/atom/everything/"
]

为了确保播客内容多样化，不会只展示发布最频繁的来源，逻辑会智能地从每个订阅源中选择三篇最新的新文章，并按时间顺序对合并列表进行排序。处理完成后，任务会更新时间戳文件，确保没有任何文章被重复处理。

def get_new_rss_entries(feed_urls, last_times):
    """
    Fetches new entries from a list of RSS feeds, ensuring a balanced selection.
    """
    all_new_entries = []
    latest_times = last_times.copy()
    # ... (headers and other setup)

    for url in feed_urls:
        # ... (fetches and parses the feed)

        # Sort entries for this feed by date and take the most recent 3
        feed_entries.sort(key=lambda e: e.published_parsed, reverse=True)
        all_new_entries.extend(feed_entries[:3])

    # Sort all collected entries by date to ensure a chronological podcast
    all_new_entries.sort(key=lambda e: e.published_parsed)
    return all_new_entries, latest_times

第 2 步：使用 Gemini 进行智能摘要

收集文章后，下一步是创建播客脚本。对于每篇文章，我们调用 Gemini 2.5 Pro 模型。获得高质量、可收听摘要的关键在于提示词。我们指示模型充当播客主持人，这引导它生成对话式的、引人入胜的文本，并消除转换为语音时会显得机械化的任何格式瑕疵。

def summarize_entries(entries, api_key):
    """Summarizes each RSS entry individually in English."""
    client = genai.Client(api_key=api_key)
    individual_summaries = []

    for entry in entries:
        content = entry.get('content', [{}])[0].get('value', entry.get('summary', ''))
        prompt = f"""
        Your role is a professional podcast host writing a script for an English-language audio briefing on Artificial Intelligence.
        Your task is to summarize the following article.

        Guidelines for the summary:
        - Write in a natural, conversational, and engaging podcast style.
        - The output must be a clean paragraph of plain text.
        - It must be suitable for direct text-to-speech conversion.

        **CRITICAL INSTRUCTIONS:**
        - **DO NOT** use any Markdown formatting.
        - **DO NOT** begin with conversational filler like "Of course, here is a summary...".
        - **DO NOT** announce what you are doing. Just provide the summary directly.

        Article to summarize:
        Title: {entry.title}
        Content: {content}
        """
        response = client.models.generate_content(
            model='gemini-2.5-pro',
            contents=prompt
        )
        individual_summaries.append({'title': entry.title, 'summary': response.text.strip()})

    return individual_summaries

第 3 步：生成和拼接音频

最后一步是将脚本转换为精美的音频文件。应用程序遍历生成的摘要，调用 Google 的 Text-to-Speech API 为每个摘要创建音频片段。然后使用 pydub 库将这些片段与预录的开头和结尾拼接在一起，创建一个上传到 Cloud Storage 的高质量 MP3 文件。

def generate_and_upload_stitched_audio(summaries, bucket_name):
    """
    Generates audio for each summary, stitches them together, and uploads to GCS.
    """
    try:
        tts_client = texttospeech.TextToSpeechClient()
        voice = texttospeech.VoiceSelectionParams(language_code="en-US", name="en-US-Studio-O")
        audio_config = texttospeech.AudioConfig(audio_encoding=texttospeech.AudioEncoding.MP3)

        all_audio_segments = []

        # Generate and add the intro
        intro_text = "Good morning, and welcome to your AI briefing. Here is the latest news."
        intro_segment = text_to_audio_segment(intro_text, tts_client, ...)
        if intro_segment:
            all_audio_segments.append(intro_segment)

        # Loop through summaries, creating audio for the title and content
        for summary in summaries:
            title_text = f"The next story is titled: {summary['title']}."
            title_segment = text_to_audio_segment(title_text, tts_client, ...)
            if title_segment:
                all_audio_segments.append(title_segment)

            summary_segment = text_to_audio_segment(summary['summary'], tts_client, ...)
            if summary_segment:
                all_audio_segments.append(summary_segment)
        # Generate and add the outro
        outro_text = "And that's all for your briefing today. Thanks for listening."
        outro_segment = text_to_audio_segment(outro_text, tts_client, ...)
        if outro_segment:
            all_audio_segments.append(outro_segment)
        # Stitch everything together using pydub and upload
        final_audio = sum(all_audio_segments)

        # ... export and upload to GCS ...

        return "gs://<your-bucket-name>/summary-YYYY-MM-DD.mp3"

    except Exception as e:
        # ... error handling ...
        return None

3、前端：简单的音频交付

前端是一个 Flask 应用程序，通过一个简单的……嗯，极简的界面提供生成的播客节目。

AI News Web 应用 UI

Web 应用程序处理以下功能：

节目列表： 显示所有可用的每日简报
音频播放器： HTML5 音频播放器，实现无缝收听
简洁 UI： 极简的、以播客为中心的界面
移动端友好： 响应式设计，可在任何设备上收听

4、使用签名 URL 实现安全音频交付

应用程序使用 Google Cloud Storage 签名 URL 实现安全音频交付，确保音频文件保持私密，同时为经过身份验证的用户提供有时限的访问权限。

def generate_signed_url(bucket_name, object_name, expiration_minutes=1440):
    """Generate a signed URL using impersonated credentials."""
    try:
        # Get default credentials from Cloud Run metadata server
        source_credentials, project = default()

        # Create impersonated credentials for the target service account
        target_credentials = impersonated_credentials.Credentials(
            source_credentials=source_credentials,
            target_principal=SERVICE_ACCOUNT_EMAIL,
            target_scopes=['https://www.googleapis.com/auth/cloud-platform'],
        )

        # Create storage client with impersonated credentials
        storage_client = storage.Client(credentials=target_credentials)
        bucket = storage_client.bucket(bucket_name)
        blob = bucket.blob(object_name)

        # Generate signed URL (24 hours expiration)
        signed_url = blob.generate_signed_url(
            version="v4",
            expiration=timedelta(minutes=expiration_minutes),
            method="GET"
        )

        return signed_url

    except Exception as e:
        # ... error handling ...
        return None

5、节目发现和渲染

Flask 应用程序通过扫描 Cloud Storage 存储桶自动发现可用节目，并按时间顺序呈现它们：

@app.route('/')
def index():
    """Main page showing available audio episodes."""
    try:
        storage_client = storage.Client()
        bucket = storage_client.bucket(GCS_BUCKET_NAME)

        # List all MP3 files in the bucket
        blobs = bucket.list_blobs(prefix="summary-", delimiter=".mp3")
        episodes = []

        for blob in blobs:
            if blob.name.endswith('.mp3'):
                # Extract date from filename (summary-YYYY-MM-DD.mp3)
                date_str = blob.name.replace('summary-', '').replace('.mp3', '')

                # Generate secure signed URL for audio access
                signed_url = generate_signed_url(GCS_BUCKET_NAME, blob.name)

                if signed_url:
                    episodes.append({
                        'filename': blob.name,
                        'date': date_str,
                        'signed_url': signed_url,
                        'created': blob.time_created
                    })

        # Sort episodes by date (newest first)
        episodes.sort(key=lambda x: x['created'], reverse=True)

        return render_template('index.html', episodes=episodes)

    except Exception as e:
        # ... error handling ...
        return render_template('error.html')

6、安全功能

应用程序包含以下内置安全功能：

私有存储： 音频文件使用签名 URL 访问安全存储
IAP 集成： 自动配置 Identity-Aware Proxy 以实现受控访问
可配置访问： 通过 Google 帐户进行基于电子邮件的用户身份验证
零配置安全： 在 Terraform 中指定用户电子邮件时自动启用 IAP
有时限的 URL： 签名 URL 在 24 小时后过期以增加安全性

7、结束语

本项目展示了将 AI 模型与无服务器云架构相结合的示例。通过利用 Gemini 的自然语言理解能力和 Google Cloud 的 Text-to-Speech 功能，我们创建了一个自动生成高质量播客内容的系统。

原文链接: Building an AI-Powered Podcast Generator with Gemini and Cloud Run

汇智网翻译整理，转载请标明出处