快速入門
Realtime agents 讓你可以透過 OpenAI Realtime API 與你的 AI 智能代理進行語音對話。本指南將帶你快速建立第一個即時語音代理。
Beta feature
Realtime agents 目前為 beta 版本。在我們持續改進實作時,請預期可能會有重大變動。
先決條件
- Python 3.9 或更高版本
- OpenAI API 金鑰
- 基本熟悉 OpenAI Agents SDK
安裝
如果尚未安裝,請先安裝 OpenAI Agents SDK:
建立你的第一個 Realtime agent
1. 匯入所需元件
2. 建立即時代理 (realtime agent)
agent = RealtimeAgent(
name="Assistant",
instructions="You are a helpful voice assistant. Keep your responses conversational and friendly.",
)
3. 設定 runner
runner = RealtimeRunner(
starting_agent=agent,
config={
"model_settings": {
"model_name": "gpt-realtime",
"voice": "ash",
"modalities": ["audio"],
"input_audio_format": "pcm16",
"output_audio_format": "pcm16",
"input_audio_transcription": {"model": "gpt-4o-mini-transcribe"},
"turn_detection": {"type": "semantic_vad", "interrupt_response": True},
}
}
)
4. 開始一個 session(會話)
# Start the session
session = await runner.run()
async with session:
print("Session started! The agent will stream audio responses in real-time.")
# Process events
async for event in session:
try:
if event.type == "agent_start":
print(f"Agent started: {event.agent.name}")
elif event.type == "agent_end":
print(f"Agent ended: {event.agent.name}")
elif event.type == "handoff":
print(f"Handoff from {event.from_agent.name} to {event.to_agent.name}")
elif event.type == "tool_start":
print(f"Tool started: {event.tool.name}")
elif event.type == "tool_end":
print(f"Tool ended: {event.tool.name}; output: {event.output}")
elif event.type == "audio_end":
print("Audio ended")
elif event.type == "audio":
# Enqueue audio for callback-based playback with metadata
# Non-blocking put; queue is unbounded, so drops won’t occur.
pass
elif event.type == "audio_interrupted":
print("Audio interrupted")
# Begin graceful fade + flush in the audio callback and rebuild jitter buffer.
elif event.type == "error":
print(f"Error: {event.error}")
elif event.type == "history_updated":
pass # Skip these frequent events
elif event.type == "history_added":
pass # Skip these frequent events
elif event.type == "raw_model_event":
print(f"Raw model event: {_truncate_str(str(event.data), 200)}")
else:
print(f"Unknown event type: {event.type}")
except Exception as e:
print(f"Error processing event: {_truncate_str(str(e), 200)}")
def _truncate_str(s: str, max_length: int) -> str:
if len(s) > max_length:
return s[:max_length] + "..."
return s
完整範例
以下是一個完整可運作的範例:
import asyncio
from agents.realtime import RealtimeAgent, RealtimeRunner
async def main():
# Create the agent
agent = RealtimeAgent(
name="Assistant",
instructions="You are a helpful voice assistant. Keep responses brief and conversational.",
)
# Set up the runner with configuration
runner = RealtimeRunner(
starting_agent=agent,
config={
"model_settings": {
"model_name": "gpt-realtime",
"voice": "ash",
"modalities": ["audio"],
"input_audio_format": "pcm16",
"output_audio_format": "pcm16",
"input_audio_transcription": {"model": "gpt-4o-mini-transcribe"},
"turn_detection": {"type": "semantic_vad", "interrupt_response": True},
}
},
)
# Start the session
session = await runner.run()
async with session:
print("Session started! The agent will stream audio responses in real-time.")
# Process events
async for event in session:
try:
if event.type == "agent_start":
print(f"Agent started: {event.agent.name}")
elif event.type == "agent_end":
print(f"Agent ended: {event.agent.name}")
elif event.type == "handoff":
print(f"Handoff from {event.from_agent.name} to {event.to_agent.name}")
elif event.type == "tool_start":
print(f"Tool started: {event.tool.name}")
elif event.type == "tool_end":
print(f"Tool ended: {event.tool.name}; output: {event.output}")
elif event.type == "audio_end":
print("Audio ended")
elif event.type == "audio":
# Enqueue audio for callback-based playback with metadata
# Non-blocking put; queue is unbounded, so drops won’t occur.
pass
elif event.type == "audio_interrupted":
print("Audio interrupted")
# Begin graceful fade + flush in the audio callback and rebuild jitter buffer.
elif event.type == "error":
print(f"Error: {event.error}")
elif event.type == "history_updated":
pass # Skip these frequent events
elif event.type == "history_added":
pass # Skip these frequent events
elif event.type == "raw_model_event":
print(f"Raw model event: {_truncate_str(str(event.data), 200)}")
else:
print(f"Unknown event type: {event.type}")
except Exception as e:
print(f"Error processing event: {_truncate_str(str(e), 200)}")
def _truncate_str(s: str, max_length: int) -> str:
if len(s) > max_length:
return s[:max_length] + "..."
return s
if __name__ == "__main__":
# Run the session
asyncio.run(main())
設定選項
模型設定
model_name
:從可用的即時語音模型(例如gpt-realtime
)中選擇voice
:選擇語音(alloy
、echo
、fable
、onyx
、nova
、shimmer
)modalities
:啟用文字或音訊(["text"]
或["audio"]
)
音訊設定
input_audio_format
:輸入音訊格式(pcm16
、g711_ulaw
、g711_alaw
)output_audio_format
:輸出音訊格式input_audio_transcription
:轉錄(transcription)設定
輪流偵測(turn detection)
type
:偵測方法(server_vad
、semantic_vad
)threshold
:語音活動閾值(0.0-1.0)silence_duration_ms
:偵測輪流結束的靜音時長prefix_padding_ms
:語音開始前的音訊補白
下一步
- 深入瞭解 Realtime agents
- 查看 examples/realtime 資料夾中的實作範例
- 為你的 agent 加入工具(Tools)
- 實作代理(Agents)之間的交接(Handoffs)
- 建立 guardrail 以確保安全性
驗證(Authentication)
請確保你的 OpenAI API 金鑰已設定於環境變數中:
或者在建立 session(會話)時直接傳入: