SDK 示例
流式响应
实现流式输出效果
什么是流式响应
流式响应(Streaming)允许 AI 在生成内容时逐步返回结果,而不是等待全部生成完成后一次性返回。这可以:
- 提供更好的用户体验(实时看到输出)
- 减少首次响应时间
- 适合长文本生成场景
启用流式响应
在请求中设置 stream: true:
{
"model": "gpt-4.1",
"messages": [{ "role": "user", "content": "写一篇文章" }],
"stream": true
}代码示例
Python
from openai import OpenAI
client = OpenAI(
api_key="sk-your-api-key",
base_url="https://api.smai.ai/v1"
)
stream = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "写一首诗"}],
stream=True
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
print() # 换行JavaScript
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'sk-your-api-key',
baseURL: 'https://api.smai.ai/v1',
});
const stream = await client.chat.completions.create({
model: 'gpt-4.1',
messages: [{ role: 'user', content: '写一首诗' }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
process.stdout.write(content);
}
}
console.log(); // 换行cURL
curl https://api.smai.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-your-api-key" \
-d '{
"model": "gpt-4.1",
"messages": [{"role": "user", "content": "写一首诗"}],
"stream": true
}'流式响应格式
流式响应使用 Server-Sent Events (SSE) 格式:
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4.1","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4.1","choices":[{"index":0,"delta":{"content":"春"},"finish_reason":null}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4.1","choices":[{"index":0,"delta":{"content":"风"},"finish_reason":null}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4.1","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]Web 前端实现
使用 fetch
async function streamChat(message) {
const response = await fetch('https://api.smai.ai/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: 'Bearer sk-your-api-key',
},
body: JSON.stringify({
model: 'gpt-4.1',
messages: [{ role: 'user', content: message }],
stream: true,
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ') && line !== 'data: [DONE]') {
const data = JSON.parse(line.slice(6));
const content = data.choices[0]?.delta?.content;
if (content) {
// 更新 UI
document.getElementById('output').textContent += content;
}
}
}
}
}使用 EventSource(不推荐用于 POST)
由于 EventSource 只支持 GET 请求,对于需要 POST 的 API,建议使用 fetch + ReadableStream。
收集完整响应
如果需要同时显示流式输出并收集完整响应:
from openai import OpenAI
client = OpenAI(
api_key="sk-your-api-key",
base_url="https://api.smai.ai/v1"
)
stream = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "写一首诗"}],
stream=True
)
full_response = ""
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
full_response += content
print()
print(f"\n完整响应长度: {len(full_response)} 字符")注意事项
Token 计算
流式响应不会在每个 chunk 中返回 token 使用量。如需统计,请在流结束后单独计算或使用非流式请求。
错误处理
流式请求中的错误可能在任何时候发生,请确保正确处理连接中断等情况。
