Responses API

Overview

Responses API is an endpoint designed specifically for inference models, supporting OpenAI's o series models and other inference-type models.

POST https://api.smai.ai/v1/responses

Models that must use this endpoint

The following models must use the Responses API: - gpt-5.2-pro - o3-pro - o3-mini - o1-pro - o1-mini - Other inference-type models

Differences from Chat Completions

Feature	Chat Completions	Responses API
Endpoint	`/v1/chat/completions`	`/v1/responses`
Applicable Models	General models	Inference models
Inference Process	Not visible	Optionally displayed
Response Format	Standard format	Extended format

Request Parameters

Parameter	Type	Required	Description
model	string	Yes	Inference model name
input	string/array	Yes	Input content
reasoning	object	No	Inference configuration
max_output_tokens	integer	No	Maximum output tokens
stream	boolean	No	Whether to stream output

Request Example

curl https://api.smai.ai/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-api-key" \
  -d '{
    "model": "o3-pro",
    "input": "Explain the principle of quantum entanglement",
    "reasoning": {
      "effort": "high"
    }
  }'

Inference Configuration

The reasoning parameter is used to control inference behavior:

{
  "reasoning": {
    "effort": "high", // Level of inference effort: low, medium, high
    "summary": "auto" // Whether to return inference summary: auto, always, never
  }
}

effort Parameter

Value	Description	Applicable Scenarios
`low`	Quick inference	Simple questions
`medium`	Balanced mode	General questions
`high`	Deep inference	Complex questions

Response Format

{
  "id": "resp-xxx",
  "object": "response",
  "created": 1234567890,
  "model": "o3-pro",
  "output": [
    {
      "type": "message",
      "content": "Quantum entanglement is a phenomenon in quantum mechanics..."
    }
  ],
  "usage": {
    "input_tokens": 10,
    "output_tokens": 500,
    "reasoning_tokens": 1000,
    "total_tokens": 1510
  }
}

Supported Models

Model	Description
`gpt-5.2-pro`	OpenAI's latest inference model
`o3-pro`	OpenAI o3 Professional version
`o3-mini`	OpenAI o3 Lightweight version
`o1-pro`	OpenAI o1 Professional version
`o1-mini`	OpenAI o1 Lightweight version

Notes

Token Consumption

Inference models will consume additional reasoning_tokens, which will also be included in the cost.

Response Time

The response time for inference models is usually longer than that of general models, especially in effort: high mode.

Responses API

On this page