smai.aismai.ai
API 文档

Responses API

API endpoint specifically for inference models

Overview

Responses API is an endpoint designed specifically for inference models, supporting OpenAI's o series models and other inference-type models.

POST https://api.smai.ai/v1/responses

Models that must use this endpoint

The following models must use the Responses API: - gpt-5.2-pro - o3-pro - o3-mini - o1-pro - o1-mini - Other inference-type models

Differences from Chat Completions

FeatureChat CompletionsResponses API
Endpoint/v1/chat/completions/v1/responses
Applicable ModelsGeneral modelsInference models
Inference ProcessNot visibleOptionally displayed
Response FormatStandard formatExtended format

Request Parameters

ParameterTypeRequiredDescription
modelstringYesInference model name
inputstring/arrayYesInput content
reasoningobjectNoInference configuration
max_output_tokensintegerNoMaximum output tokens
streambooleanNoWhether to stream output

Request Example

curl https://api.smai.ai/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-api-key" \
  -d '{
    "model": "o3-pro",
    "input": "Explain the principle of quantum entanglement",
    "reasoning": {
      "effort": "high"
    }
  }'

Inference Configuration

The reasoning parameter is used to control inference behavior:

{
  "reasoning": {
    "effort": "high", // Level of inference effort: low, medium, high
    "summary": "auto" // Whether to return inference summary: auto, always, never
  }
}

effort Parameter

ValueDescriptionApplicable Scenarios
lowQuick inferenceSimple questions
mediumBalanced modeGeneral questions
highDeep inferenceComplex questions

Response Format

{
  "id": "resp-xxx",
  "object": "response",
  "created": 1234567890,
  "model": "o3-pro",
  "output": [
    {
      "type": "message",
      "content": "Quantum entanglement is a phenomenon in quantum mechanics..."
    }
  ],
  "usage": {
    "input_tokens": 10,
    "output_tokens": 500,
    "reasoning_tokens": 1000,
    "total_tokens": 1510
  }
}

Supported Models

ModelDescription
gpt-5.2-proOpenAI's latest inference model
o3-proOpenAI o3 Professional version
o3-miniOpenAI o3 Lightweight version
o1-proOpenAI o1 Professional version
o1-miniOpenAI o1 Lightweight version

Notes

Token Consumption

Inference models will consume additional reasoning_tokens, which will also be included in the cost.

Response Time

The response time for inference models is usually longer than that of general models, especially in effort: high mode.

On this page