API 文档
Responses API
API endpoint specifically for inference models
Overview
Responses API is an endpoint designed specifically for inference models, supporting OpenAI's o series models and other inference-type models.
POST https://api.smai.ai/v1/responsesModels that must use this endpoint
The following models must use the Responses API: - gpt-5.2-pro - o3-pro - o3-mini -
o1-pro - o1-mini - Other inference-type models
Differences from Chat Completions
| Feature | Chat Completions | Responses API |
|---|---|---|
| Endpoint | /v1/chat/completions | /v1/responses |
| Applicable Models | General models | Inference models |
| Inference Process | Not visible | Optionally displayed |
| Response Format | Standard format | Extended format |
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | Inference model name |
| input | string/array | Yes | Input content |
| reasoning | object | No | Inference configuration |
| max_output_tokens | integer | No | Maximum output tokens |
| stream | boolean | No | Whether to stream output |
Request Example
curl https://api.smai.ai/v1/responses \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-your-api-key" \
-d '{
"model": "o3-pro",
"input": "Explain the principle of quantum entanglement",
"reasoning": {
"effort": "high"
}
}'Inference Configuration
The reasoning parameter is used to control inference behavior:
{
"reasoning": {
"effort": "high", // Level of inference effort: low, medium, high
"summary": "auto" // Whether to return inference summary: auto, always, never
}
}effort Parameter
| Value | Description | Applicable Scenarios |
|---|---|---|
low | Quick inference | Simple questions |
medium | Balanced mode | General questions |
high | Deep inference | Complex questions |
Response Format
{
"id": "resp-xxx",
"object": "response",
"created": 1234567890,
"model": "o3-pro",
"output": [
{
"type": "message",
"content": "Quantum entanglement is a phenomenon in quantum mechanics..."
}
],
"usage": {
"input_tokens": 10,
"output_tokens": 500,
"reasoning_tokens": 1000,
"total_tokens": 1510
}
}Supported Models
| Model | Description |
|---|---|
gpt-5.2-pro | OpenAI's latest inference model |
o3-pro | OpenAI o3 Professional version |
o3-mini | OpenAI o3 Lightweight version |
o1-pro | OpenAI o1 Professional version |
o1-mini | OpenAI o1 Lightweight version |
Notes
Token Consumption
Inference models will consume additional reasoning_tokens, which will also be included in the cost.
Response Time
The response time for inference models is usually longer than that of general models, especially in effort: high mode.
