流式响应

什么是流式响应

流式响应（Streaming）允许 AI 在生成内容时逐步返回结果，而不是等待全部生成完成后一次性返回。这可以：

提供更好的用户体验（实时看到输出）
减少首次响应时间
适合长文本生成场景

启用流式响应

在请求中设置 stream: true：

{
  "model": "gpt-4.1",
  "messages": [{ "role": "user", "content": "写一篇文章" }],
  "stream": true
}

代码示例

Python

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://api.smai.ai/v1"
)

stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "写一首诗"}],
    stream=True
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

print()  # 换行

JavaScript

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-your-api-key',
  baseURL: 'https://api.smai.ai/v1',
});

const stream = await client.chat.completions.create({
  model: 'gpt-4.1',
  messages: [{ role: 'user', content: '写一首诗' }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    process.stdout.write(content);
  }
}

console.log(); // 换行

cURL

curl https://api.smai.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-api-key" \
  -d '{
    "model": "gpt-4.1",
    "messages": [{"role": "user", "content": "写一首诗"}],
    "stream": true
  }'

流式响应格式

流式响应使用 Server-Sent Events (SSE) 格式：

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4.1","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4.1","choices":[{"index":0,"delta":{"content":"春"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4.1","choices":[{"index":0,"delta":{"content":"风"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4.1","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Web 前端实现

使用 fetch

async function streamChat(message) {
  const response = await fetch('https://api.smai.ai/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: 'Bearer sk-your-api-key',
    },
    body: JSON.stringify({
      model: 'gpt-4.1',
      messages: [{ role: 'user', content: message }],
      stream: true,
    }),
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');

    for (const line of lines) {
      if (line.startsWith('data: ') && line !== 'data: [DONE]') {
        const data = JSON.parse(line.slice(6));
        const content = data.choices[0]?.delta?.content;
        if (content) {
          // 更新 UI
          document.getElementById('output').textContent += content;
        }
      }
    }
  }
}

使用 EventSource（不推荐用于 POST）

由于 EventSource 只支持 GET 请求，对于需要 POST 的 API，建议使用 fetch + ReadableStream。

收集完整响应

如果需要同时显示流式输出并收集完整响应：

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://api.smai.ai/v1"
)

stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "写一首诗"}],
    stream=True
)

full_response = ""
for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)
        full_response += content

print()
print(f"\n完整响应长度: {len(full_response)} 字符")

注意事项

Token 计算

流式响应不会在每个 chunk 中返回 token 使用量。如需统计，请在流结束后单独计算或使用非流式请求。

错误处理

流式请求中的错误可能在任何时候发生，请确保正确处理连接中断等情况。