Streaming Response

What is Streaming Response

Streaming response allows AI to return results incrementally while generating content, rather than waiting for the entire generation to complete before returning all at once. This can:

Provide a better user experience (seeing output in real-time)
Reduce initial response time
Be suitable for long text generation scenarios

Enabling Streaming Response

Set stream: true in the request:

{
  "model": "gpt-4.1",
  "messages": [{"role": "user", "content": "Write an article"}],
  "stream": true
}

Code Examples

Python

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://api.smai.ai/v1"
)

stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "Write a poem"}],
    stream=True
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

print()  # New line

JavaScript

import OpenAI from 'openai';

const client = new OpenAI({
    apiKey: 'sk-your-api-key',
    baseURL: 'https://api.smai.ai/v1'
});

const stream = await client.chat.completions.create({
    model: 'gpt-4.1',
    messages: [{ role: 'user', content: 'Write a poem' }],
    stream: true
});

for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content;
    if (content) {
        process.stdout.write(content);
    }
}

console.log();  // New line

cURL

curl https://api.smai.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-api-key" \
  -d '{
    "model": "gpt-4.1",
    "messages": [{"role": "user", "content": "Write a poem"}],
    "stream": true
  }'

Streaming Response Format

Streaming responses use the Server-Sent Events (SSE) format:

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4.1","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4.1","choices":[{"index":0,"delta":{"content":"Spring"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4.1","choices":[{"index":0,"delta":{"content":"Wind"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4.1","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Web Frontend Implementation

Using fetch

async function streamChat(message) {
    const response = await fetch('https://api.smai.ai/v1/chat/completions', {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
            'Authorization': 'Bearer sk-your-api-key'
        },
        body: JSON.stringify({
            model: 'gpt-4.1',
            messages: [{ role: 'user', content: message }],
            stream: true
        })
    });

    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split('\n');

        for (const line of lines) {
            if (line.startsWith('data: ') && line !== 'data: [DONE]') {
                const data = JSON.parse(line.slice(6));
                const content = data.choices[0]?.delta?.content;
                if (content) {
                    // Update UI
                    document.getElementById('output').textContent += content;
                }
            }
        }
    }
}

Using EventSource (not recommended for POST)

Since EventSource only supports GET requests, for APIs that require POST, it is recommended to use fetch + ReadableStream.

Collecting Full Response

If you need to display streaming output while collecting the full response:

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://api.smai.ai/v1"
)

stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "Write a poem"}],
    stream=True
)

full_response = ""
for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)
        full_response += content

print()
print(f"\nFull response length: {len(full_response)} characters")

Notes

Token Calculation

Streaming responses do not return token usage in each chunk. For statistics, please calculate separately after the stream ends or use non-streaming requests.

Error Handling

Errors in streaming requests may occur at any time, please ensure to handle connection interruptions and other situations correctly.

Python SDK

Complete example in Python

JavaScript SDK

Complete example in JavaScript

Streaming Response

Python SDK

JavaScript SDK

On this page