Streaming Response
Implementing streaming output effects
What is Streaming Response
Streaming response allows AI to return results incrementally while generating content, rather than waiting for the entire generation to complete before returning all at once. This can:
- Provide a better user experience (seeing output in real-time)
- Reduce initial response time
- Be suitable for long text generation scenarios
Enabling Streaming Response
Set stream: true in the request:
{
"model": "gpt-4.1",
"messages": [{"role": "user", "content": "Write an article"}],
"stream": true
}Code Examples
Python
from openai import OpenAI
client = OpenAI(
api_key="sk-your-api-key",
base_url="https://api.smai.ai/v1"
)
stream = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "Write a poem"}],
stream=True
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
print() # New lineJavaScript
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'sk-your-api-key',
baseURL: 'https://api.smai.ai/v1'
});
const stream = await client.chat.completions.create({
model: 'gpt-4.1',
messages: [{ role: 'user', content: 'Write a poem' }],
stream: true
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
process.stdout.write(content);
}
}
console.log(); // New linecURL
curl https://api.smai.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-your-api-key" \
-d '{
"model": "gpt-4.1",
"messages": [{"role": "user", "content": "Write a poem"}],
"stream": true
}'Streaming Response Format
Streaming responses use the Server-Sent Events (SSE) format:
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4.1","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4.1","choices":[{"index":0,"delta":{"content":"Spring"},"finish_reason":null}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4.1","choices":[{"index":0,"delta":{"content":"Wind"},"finish_reason":null}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4.1","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]Web Frontend Implementation
Using fetch
async function streamChat(message) {
const response = await fetch('https://api.smai.ai/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer sk-your-api-key'
},
body: JSON.stringify({
model: 'gpt-4.1',
messages: [{ role: 'user', content: message }],
stream: true
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ') && line !== 'data: [DONE]') {
const data = JSON.parse(line.slice(6));
const content = data.choices[0]?.delta?.content;
if (content) {
// Update UI
document.getElementById('output').textContent += content;
}
}
}
}
}Using EventSource (not recommended for POST)
Since EventSource only supports GET requests, for APIs that require POST, it is recommended to use fetch + ReadableStream.
Collecting Full Response
If you need to display streaming output while collecting the full response:
from openai import OpenAI
client = OpenAI(
api_key="sk-your-api-key",
base_url="https://api.smai.ai/v1"
)
stream = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "Write a poem"}],
stream=True
)
full_response = ""
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
full_response += content
print()
print(f"\nFull response length: {len(full_response)} characters")Notes
Token Calculation
Streaming responses do not return token usage in each chunk. For statistics, please calculate separately after the stream ends or use non-streaming requests.
Error Handling
Errors in streaming requests may occur at any time, please ensure to handle connection interruptions and other situations correctly.
