sparklesGateway API

Get an API Key

chevron-rightTeam API Keyhashtag

Within workspace, navigate to:

Settings → Teams tab → Select team → API Keys → Generate API Key

Make sure you save this API Key - you won't be able to access it again after closing the dialog.

circle-info

Each team can have its API Key with custom settings: models enabled, fallbacks, etc.

If needed, you can rotate the API Key within API keys tab settings. This will rotate the API Key so the old one is deprecated, and you can change it to a new one.

chevron-rightUser API Keyhashtag

Within workspace, navigate to: Left side bar Settings → API Keys → Generate API Key

Make sure you save this API Key - you won't be able to access it again after closing the dialog.

circle-info

Your API Key is set with custom settings: models enabled, fallbacks, etc. based on the team settings (made by org owner).

If needed, you can rotate the API Key within API keys tab settings. This will rotate the API Key so the old one is deprecated, and you can change it to a new one.

Chat

Chat Completion

post

Generate a response from conversation messages.

Authorizations
OAuth2clientCredentialsRequired
Query parameters
fallbacksstring · enumOptional

To use or not fallbacks in chat completion request

Possible values:
Body
modelstringRequired

Model name, UUID, or agent UUID.

Example: 6948fe4d-98ce-4f36-bc49-5f652cc07b65
storebooleanOptional

Whether or not to store the output of this chat completion request

frequency_penaltynumber · min: -2 · max: 2Optional

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

Default: 0
logprobsbooleanOptional

Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.

top_logprobsinteger · max: 20Optional

An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.

max_tokensinteger · nullableOptionalDeprecated

The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

This value is now deprecated in favor of max_completion_tokens, and is not compatible with o1 series models.

max_completion_tokensinteger · nullableOptional

An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

ninteger · min: 1 · max: 128Optional

How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.

Default: 1Example: 1
predictionone ofOptional

Configuration for a Predicted Output, which can greatly improve response times when large parts of the model response are known ahead of time. This is most common when you are regenerating a file with only minor changes to most of the content.

presence_penaltynumber · min: -2 · max: 2Optional

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

Default: 0
response_formatone ofOptional

An object specifying the format that the model must output. Compatible with GPT-4o, GPT-4o mini, GPT-4 Turbo and all GPT-3.5 Turbo models newer than gpt-3.5-turbo-1106.

Setting to { "type": "json_schema", "json_schema": {...} } enables Structured Outputs which ensures the model will match your supplied JSON schema.

Setting to { "type": "json_object" } enables JSON mode, which ensures the message the model generates is valid JSON.

Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.

or
or
seedinteger · min: -9223372036854776000 · max: 9223372036854776000Optional

This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.

service_tierstring · enumOptional

Specifies the processing type used for serving the request. If set to 'default' or 'auto', then the request will be processed with the standard pricing and performance for the selected model. If set to 'flex' or 'priority', then the request will be processed with the corresponding service tier. When not set, the default behavior is 'auto'. When the service_tier parameter is set, the response body will include the service_tier value based on the processing mode actually used to serve the request. This response value may be different from the value set in the parameter.

Default: autoPossible values:
stopone ofOptional

Up to 4 sequences where the API will stop generating further tokens.

stringOptional
or
string[] · min: 1 · max: 4Optional
streambooleanOptional

If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.

temperaturenumber · max: 2Optional

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

We generally recommend altering this or top_p but not both.

Default: 1Example: 1
top_pnumber · max: 1Optional

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.

Default: 1Example: 1
tool_choiceone ofOptional

Controls which (if any) tool is called by the model. none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools. Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool.

none is the default when no tools are present. auto is the default if tools are present.

string · enumOptional

none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.

Possible values:
or
parallel_tool_callsbooleanOptional

Whether to enable parallel function calling during tool use.

Default: true
function_callone ofOptionalDeprecated

Deprecated in favor of tool_choice.

Controls which (if any) function is called by the model. none means the model will not call a function and instead generates a message. auto means the model can pick between generating a message or calling a function. Specifying a particular function via {"name": "my_function"} forces the model to call that function.

none is the default when no functions are present. auto is the default if functions are present.

string · enumOptional

none means the model will not call a function and instead generates a message. auto means the model can pick between generating a message or calling a function.

Possible values:
or
reasoning_effortstring · enumOptional

Reasoning effort for models that support reasoning.

Possible values:
verbositystring · enumOptional

Constrains the verbosity of the model's response.

Possible values:
Responses
chevron-right
200

Represents a chat completion response returned by model, based on the provided input.

application/json
post
/v1/chat/completions

Audio

Audio Speech

post

Generate speech audio from text.

Authorizations
OAuth2clientCredentialsRequired
Body
modelstringRequiredExample: 6948fe4d-98ce-4f36-bc49-5f652cc07b65
inputstring · max: 4096Required

The text to generate audio for.

voicestring · enumRequired

The voice to use when generating the audio.

Possible values:
response_formatstring · enumOptional

The format to output audio in.

Default: mp3Possible values:
speednumber · min: 0.25 · max: 4Optional

The speed of the generated audio.

Default: 1
Responses
chevron-right
200

Successful response with an audio speech.

application/octet-stream
string · binaryOptional
post
/v1/audio/speech

Audio Transcriptions

post

Transcribe audio to text.

Authorizations
OAuth2clientCredentialsRequired
Body
filestring · binaryRequired

The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.

modelstringRequiredExample: 6948fe4d-98ce-4f36-bc49-5f652cc07b65
languagestringOptional

The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.

promptstringOptional

An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

response_formatstring · enumOptional

The format of the transcript output.

Default: jsonPossible values:
temperaturenumberOptional

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

Default: 0
Responses
chevron-right
200

OK

application/json
or
post
/v1/audio/transcriptions

Audio Translations

post

Translate audio to English text.

Authorizations
OAuth2clientCredentialsRequired
Body
filestring · binaryRequired

The audio file object (not file name) translate, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.

modelstringRequiredExample: 6948fe4d-98ce-4f36-bc49-5f652cc07b65
promptstringOptional

An optional text to guide the model's style or continue a previous audio segment. The prompt should be in English.

response_formatstring · enumOptional

The format of the translated transcript output.

Default: jsonPossible values:
temperaturenumberOptional

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

Default: 0
Responses
chevron-right
200

OK

application/json
or
post
/v1/audio/translations

Images

Images Generations

post

Generate images from a text prompt.

Authorizations
OAuth2clientCredentialsRequired
Body
promptstringRequired

A text description of the desired image(s).

modelstringRequiredExample: 6948fe4d-98ce-4f36-bc49-5f652cc07b65
ninteger · min: 1 · max: 10 · nullableOptional

The number of images to generate. Must be between 1 and 10.

Example: 1
qualitystring · enumOptional

The quality of the generated images.

Default: standardPossible values:
response_formatstring · enum · nullableOptional

The format of the generated images.

Default: urlPossible values:
sizestring · enum · nullableOptional

The size of the generated images.

Possible values:
stylestring · enum · nullableOptional

The style of the generated images.

Default: vividPossible values:
Responses
chevron-right
200

Represents an image response returned by model, based on the provided input.

application/json
createdintegerRequired

The Unix timestamp (in seconds) of when the images were created.

post
/v1/images/generations

Embeddings

Embeddings

post

Get a vector representation of a given input that can be easily consumed by machine learning models and algorithms.

Authorizations
OAuth2clientCredentialsRequired
Body
modelstringRequiredExample: 6948fe4d-98ce-4f36-bc49-5f652cc07b65
inputone ofOptional

Input text to get embeddings for.

stringOptionalExample: The food was delicious and the waiter...
or
string[]OptionalExample: ["The food was delicious","The waiter was friendly"]
or
integer[]OptionalExample: [1,2,3,4,5]
or
encoding_formatstring · enumOptional

The format to return the embeddings in. Can be either float or base64.

Default: floatPossible values:
dimensionsinteger · min: 1Optional

The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.

Example: 1536
Responses
chevron-right
200

Represents an embedding response returned by model, based on the provided input.

application/json
objectstringOptional

The object type, which is always "list"

Example: list
modelstringOptional

The model used for generating embeddings

post
/v1/embeddings

Storage

List Uploaded Media Files

get

List uploaded files.

Authorizations
OAuth2clientCredentialsRequired
Query parameters
afterstringOptional

A cursor for use in pagination. after is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include after=obj_foo in order to fetch the next page of the list.

limitinteger · uint32Optional

A limit on the number of objects to be returned. Limit can range between 1 and 10,000, and the default is 10,000.

Default: 10000
orderstring · enumOptional

Sort order by the created_at timestamp of the objects. asc for ascending order and desc for descending order.

Default: descPossible values:
purposestring · enumOptional

The intended purpose for a file.

Possible values:
Responses
chevron-right
200

Successful response with list of files

application/json

A list of uploaded files.

objectstring · enumOptionalPossible values:
get
/v1/storage

Upload Media File

post

Upload a media file for later use.

Authorizations
OAuth2clientCredentialsRequired
Body
filestring · binaryRequired

The File object (not file name) to be uploaded.

purposestring · enumRequired

The intended purpose for a file.

Possible values:
Responses
chevron-right
200

Successful response with file upload details

application/json

The File object represents a document that has been uploaded to OpenAI.

bytesintegerOptional

The size of the file, in bytes.

created_atintegerOptional

The Unix timestamp (in seconds) for when the file was created.

filenamestringOptional

The name of the file.

idstringOptional

The file identifier, which can be referenced in the API endpoints.

purposestring · enumOptional

The intended purpose of the file.

Possible values:
statusstring · enumOptional

The status of the file.

Possible values:
objectstring · enumOptionalPossible values:
post
/v1/storage

Get Storage File

get

Retrieve file details by ID.

Authorizations
OAuth2clientCredentialsRequired
Path parameters
file_idstringRequired

The ID of the file to use for this request.

Responses
chevron-right
200

Successful response with file details

application/json

The File object represents a document that has been uploaded to OpenAI.

bytesintegerOptional

The size of the file, in bytes.

created_atintegerOptional

The Unix timestamp (in seconds) for when the file was created.

filenamestringOptional

The name of the file.

idstringOptional

The file identifier, which can be referenced in the API endpoints.

purposestring · enumOptional

The intended purpose of the file.

Possible values:
statusstring · enumOptional

The status of the file.

Possible values:
objectstring · enumOptionalPossible values:
get
/v1/storage/{file_id}

Delete Storage File

delete

Delete a file by ID.

Authorizations
OAuth2clientCredentialsRequired
Path parameters
file_idstringRequired
Responses
chevron-right
200

Successful response with deletion status.

application/json

The status of a file deletion.

idstringRequired

The file identifier.

deletedbooleanRequired

Indicates whether the file has been deleted.

delete
/v1/storage/{file_id}

Get Storage File Contents

get

Download file contents by ID.

Authorizations
OAuth2clientCredentialsRequired
Path parameters
file_idstringRequired
Responses
chevron-right
200

Successful response with file contents

application/octet-stream
string · binaryOptional
get
/v1/storage/{file_id}/content

Models

List models

get

List all models available to the user.

Authorizations
OAuth2clientCredentialsRequired
Responses
chevron-right
200

A list of models available to use for the current user.

application/json
objectstringRequiredExample: list
totalintegerRequiredExample: 1
get
/v1/models

Get fallbacks configuration for particular model

get

Fallbacks configuration for particular model. Fallbacks go model by model through by the order of model ids list being supplied

Authorizations
OAuth2clientCredentialsRequired
Path parameters
modelstringRequired

The ID of the model.

Responses
chevron-right
200

Model fallbacks configuration found.

application/json
idstringRequired

Unique identifier for the model

Example: 123e4567-e89b-12d3-a456-426614174000
fallback_idsstring · uuid[]Optional

The ID of the model to associate with the company.

Example: 987f6543-d21c-45e7-b678-123456789abc
created_atstring · date-timeOptional

The time the model fallbacks configuration was created.

Example: 2024-11-19T10:00:00Z
updated_atstring · date-timeOptional

The time the model fallbacks configuration was last updated.

Example: 2024-11-19T12:00:00Z
get
/v1/management/models/{model}/fallbacks

Add fallbacks configuration for particular model

post

Fallbacks configuration for particular model. Fallback go model by model through by the order of model ids list being supplied

Authorizations
OAuth2clientCredentialsRequired
Path parameters
modelstringRequired

The ID of the model.

Body
modelsstring · uuid[]Required

The ID of the model to associate with the company.

Example: 987f6543-d21c-45e7-b678-123456789abc
Responses
post
/v1/management/models/{model}/fallbacks

Delete model fallbacks configuration

delete

Delete model fallbacks configuration

Authorizations
OAuth2clientCredentialsRequired
Path parameters
modelstringRequired

The ID of the model.

Responses
delete
/v1/management/models/{model}/fallbacks

No content

Add fallbacks configuration for particular model

patch

Fallbacks configuration for particular model. Fallback go model by model through by the order of model ids list being supplied

Authorizations
OAuth2clientCredentialsRequired
Path parameters
modelstringRequired

The ID of the model.

Body
modelsstring · uuid[]Required

The ID of the model to associate with the company.

Example: 987f6543-d21c-45e7-b678-123456789abc
Responses
chevron-right
200

Model fallbacks configuration successfully updated.

application/json
idstringRequired

Unique identifier for the model

Example: 123e4567-e89b-12d3-a456-426614174000
fallback_idsstring · uuid[]Optional

The ID of the model to associate with the company.

Example: 987f6543-d21c-45e7-b678-123456789abc
created_atstring · date-timeOptional

The time the model fallbacks configuration was created.

Example: 2024-11-19T10:00:00Z
updated_atstring · date-timeOptional

The time the model fallbacks configuration was last updated.

Example: 2024-11-19T12:00:00Z
patch
/v1/management/models/{model}/fallbacks

Team Management

List team API Keys

get
Authorizations
OAuth2clientCredentialsRequired
Path parameters
teamIdstringRequired

The ID of the team.

Responses
chevron-right
200

List of team API keys

application/json
api_keystringRequired

API Key for use with completions API

Example: nexos-3gfdtu46uiotry456iupo3240v22
idstringRequired

Unique identifier for the API key

Example: 123e4567-e89b-12d3-a456-426614174000
namestringRequired

Display name for the API key

Example: Production API Key
created_atstring · date-timeOptional

The time the key was created.

Example: 2024-11-19T10:00:00Z
updated_atstring · date-timeOptional

The time the key was last updated (only name can be updated).

Example: 2024-11-19T12:00:00Z
get
/v1/teams/{teamId}/api_keys
200

List of team API keys

Create team API Key

post
Authorizations
OAuth2clientCredentialsRequired
Path parameters
teamIdstringRequired

The ID of the team.

Body
namestringRequired

The name of the key (needs to be unique).

Example: My API Key
Responses
post
/v1/teams/{teamId}/api_keys
201

Team API Key

Delete team API Key

delete
Authorizations
OAuth2clientCredentialsRequired
Path parameters
teamIdstringRequired

The ID of the team.

keyIdstringRequired

The ID of the key.

Responses
delete
/v1/teams/{teamId}/api_keys/{keyId}

No content

Update team API Key

patch
Authorizations
OAuth2clientCredentialsRequired
Path parameters
teamIdstringRequired

The ID of the team.

keyIdstringRequired

The ID of the key.

Body
namestringRequired

The name of the key (needs to be unique).

Example: My API Key
Responses
chevron-right
200

Team API Key

application/json
api_keystringRequired

API Key for use with completions API

Example: nexos-3gfdtu46uiotry456iupo3240v22
idstringRequired

Unique identifier for the API key

Example: 123e4567-e89b-12d3-a456-426614174000
namestringRequired

Display name for the API key

Example: Production API Key
created_atstring · date-timeOptional

The time the key was created.

Example: 2024-11-19T10:00:00Z
updated_atstring · date-timeOptional

The time the key was last updated (only name can be updated).

Example: 2024-11-19T12:00:00Z
patch
/v1/teams/{teamId}/api_keys/{keyId}

Rotate API Key

patch

Rotates the API key used to authenticate this request. Returns a new key value.

Authorizations
AuthorizationstringRequired
Bearer authentication header of the form Bearer <token>.
Responses
chevron-right
200

Rotated API Key

application/json
api_keystringRequired

API Key for use with completions API

Example: nexos-3gfdtu46uiotry456iupo3240v22
idstringRequired

Unique identifier for the API key

Example: 123e4567-e89b-12d3-a456-426614174000
namestringRequired

Display name for the API key

Example: Production API Key
created_atstring · date-timeOptional

The time the key was created.

Example: 2024-11-19T10:00:00Z
updated_atstring · date-timeOptional

The time the key was last updated (only name can be updated).

Example: 2024-11-19T12:00:00Z
patch
/v1/apikey/rotate
Deprecated

Regenerate team API Key

patch
Authorizations
OAuth2clientCredentialsRequired
Path parameters
teamIdstringRequired

The ID of the team.

keyIdstringRequired

The ID of the key.

Responses
chevron-right
200

Team API Key

application/json
api_keystringRequired

API Key for use with completions API

Example: nexos-3gfdtu46uiotry456iupo3240v22
idstringRequired

Unique identifier for the API key

Example: 123e4567-e89b-12d3-a456-426614174000
namestringRequired

Display name for the API key

Example: Production API Key
created_atstring · date-timeOptional

The time the key was created.

Example: 2024-11-19T10:00:00Z
updated_atstring · date-timeOptional

The time the key was last updated (only name can be updated).

Example: 2024-11-19T12:00:00Z
patch
/v1/teams/{teamId}/api_keys/{keyId}/regenerate

User Management

List user API Keys

get
Authorizations
OAuth2clientCredentialsRequired
Responses
chevron-right
200

List of user API keys

application/json
api_keystringRequired

API Key for use with completions API

Example: nexos-3gfdtu46uiotry456iupo3240v22
idstringRequired

Unique identifier for the API key

Example: 123e4567-e89b-12d3-a456-426614174000
namestringRequired

Display name for the API key

Example: Production API Key
created_atstring · date-timeOptional

The time the key was created.

Example: 2024-11-19T10:00:00Z
updated_atstring · date-timeOptional

The time the key was last updated (only name can be updated).

Example: 2024-11-19T12:00:00Z
get
/v1/user/api_keys
200

List of user API keys

Create user API Key

post
Authorizations
OAuth2clientCredentialsRequired
Body
namestringRequired

The name of the key (needs to be unique).

Example: My API Key
Responses
post
/v1/user/api_keys
201

User API Key

Delete user API Key

delete
Authorizations
OAuth2clientCredentialsRequired
Path parameters
keyIdstringRequired

The ID of the key.

Responses
delete
/v1/user/api_keys/{keyId}

No content

Update user API Key

patch
Authorizations
OAuth2clientCredentialsRequired
Path parameters
keyIdstringRequired

The ID of the key.

Body
namestringRequired

The name of the key (needs to be unique).

Example: My API Key
Responses
chevron-right
200

User API Key

application/json
api_keystringRequired

API Key for use with completions API

Example: nexos-3gfdtu46uiotry456iupo3240v22
idstringRequired

Unique identifier for the API key

Example: 123e4567-e89b-12d3-a456-426614174000
namestringRequired

Display name for the API key

Example: Production API Key
created_atstring · date-timeOptional

The time the key was created.

Example: 2024-11-19T10:00:00Z
updated_atstring · date-timeOptional

The time the key was last updated (only name can be updated).

Example: 2024-11-19T12:00:00Z
patch
/v1/user/api_keys/{keyId}

Rotate API Key

patch

Rotates the API key used to authenticate this request. Returns a new key value.

Authorizations
AuthorizationstringRequired
Bearer authentication header of the form Bearer <token>.
Responses
chevron-right
200

Rotated API Key

application/json
api_keystringRequired

API Key for use with completions API

Example: nexos-3gfdtu46uiotry456iupo3240v22
idstringRequired

Unique identifier for the API key

Example: 123e4567-e89b-12d3-a456-426614174000
namestringRequired

Display name for the API key

Example: Production API Key
created_atstring · date-timeOptional

The time the key was created.

Example: 2024-11-19T10:00:00Z
updated_atstring · date-timeOptional

The time the key was last updated (only name can be updated).

Example: 2024-11-19T12:00:00Z
patch
/v1/apikey/rotate
Deprecated

Regenerate user API Key

patch
Authorizations
OAuth2clientCredentialsRequired
Path parameters
keyIdstringRequired

The ID of the key.

Responses
chevron-right
200

User API Key

application/json
api_keystringRequired

API Key for use with completions API

Example: nexos-3gfdtu46uiotry456iupo3240v22
idstringRequired

Unique identifier for the API key

Example: 123e4567-e89b-12d3-a456-426614174000
namestringRequired

Display name for the API key

Example: Production API Key
created_atstring · date-timeOptional

The time the key was created.

Example: 2024-11-19T10:00:00Z
updated_atstring · date-timeOptional

The time the key was last updated (only name can be updated).

Example: 2024-11-19T12:00:00Z
patch
/v1/user/api_keys/{keyId}/regenerate

Assistant Management

Deprecated

Get assistants for a specific user. Deprecated, use /v1/agents instead.

get

Deprecated: Use /v1/agents instead. Retrieve list of assistants for a specific user.

Authorizations
OAuth2clientCredentialsRequired
Query parameters
limitinteger · int64 · min: 1 · max: 200Optional

The number of items to return.

Default: 100
offsetinteger · int64Optional

The number of items to skip.

sort_by.fieldstring · enumOptional

Field to sort by, must be used together with sort_by.order.

Possible values:
sort_by.orderstring · enumOptional

Sort order, must be used together with sort_by.field.

Possible values:
include_sharedbooleanOptional

Whether to include assistants shared with the user.

Default: false
Responses
chevron-right
200

A list of assistants.

application/json
totalinteger · int64OptionalExample: 1
get
/v1/assistants

Responses

Create a model response

post

Creates a model response.

Body
top_logprobsinteger · max: 20Optional

An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability.

temperaturenumber · max: 2 · nullableOptional

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

Default: 1Example: 1
top_pnumber · max: 1 · nullableOptional

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.

Default: 1Example: 1
userstringOptionalDeprecated

Deprecated in favor of safety_identifier and prompt_cache_key; use prompt_cache_key to maintain caching. A stable end-user identifier to improve cache hit rates and help detect abuse.

Example: user-1234
safety_identifierstringOptional

A stable identifier used to help detect users who may violate usage policies. Use a unique per-user string (e.g., a hash of username or email) to avoid sending identifying information.

Example: safety-identifier-1234
prompt_cache_keystringOptional

Used to cache responses for similar requests and improve cache hit rates. Replaces the user field.

Example: prompt-cache-key-1234
service_tierstring · enum · nullableOptional

Specifies the processing tier for the request. The response includes the actual tier used, which may differ from the requested value.

Default: autoPossible values:
prompt_cache_retentionstring · enum · nullableOptional

Retention policy for the prompt cache. Set to 24h to keep cached prefixes active longer (up to 24 hours).

Possible values:
previous_response_idstring · nullableOptional

The unique ID of the previous response to the model. Use this to create multi-turn conversations. Cannot be used with conversation.

modelstringRequired

Model ID used to generate the response (e.g., gpt-4o or o3). See your provider's model guide for available options.

backgroundboolean · nullableOptional

Whether to run the model response in the background.

Default: false
max_output_tokensinteger · nullableOptional

An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.

max_tool_callsinteger · nullableOptional

The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.

tool_choiceany ofOptional

How the model should select which tool (or tools) to use when generating a response. See the tools parameter to see how to specify which tools the model can call.

string · enumOptional

Controls which (if any) tool is called by the model.

none means the model will not call any tool and instead generates a message.

auto means the model can pick between generating a message or calling one or more tools.

required means the model must call one or more tools.

Possible values:
or
or
or
or
or
or
or
truncationstring · enum · nullableOptional

The truncation strategy to use for the model response.

  • auto: If the input to this Response exceeds the model's context window size, the model will truncate the response to fit the context window by dropping items from the beginning of the conversation.
  • disabled (default): If the input size will exceed the context window size for a model, the request will fail with a 400 error.
Default: disabledPossible values:
inputany ofOptional

Text, image, or file inputs used to generate a response. Use this to provide content the model should consider.

stringOptional

A text input to the model, equivalent to a text input with the user role.

or
parallel_tool_callsboolean · nullableOptional

Whether to allow the model to run tool calls in parallel.

Default: true
storeboolean · nullableOptional

Whether to store the generated model response for later retrieval via API.

Default: true
instructionsstring · nullableOptional

A system (or developer) message inserted into the model's context.

When using along with previous_response_id, the instructions from a previous response will not be carried over to the next response. This makes it simple to swap out system (or developer) messages in new responses.

streamboolean · nullableOptional

If set to true, the model response data will be streamed to the client as it is generated using server-sent events.

Default: false
conversationany of · nullableOptional

The conversation that this response belongs to. Items from this conversation are prepended to input_items for this response request. Input items and output items from this response are automatically added to this conversation after this response completes.

stringOptional

The unique ID of the conversation.

or
Responses
chevron-right
200

OK

top_logprobsinteger · max: 20 · nullableOptional

An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability.

temperaturenumber · max: 2 · nullableRequired

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

Default: 1Example: 1
top_pnumber · max: 1 · nullableRequired

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.

Default: 1Example: 1
userstringOptionalDeprecated

Deprecated in favor of safety_identifier and prompt_cache_key; use prompt_cache_key to maintain caching. A stable end-user identifier to improve cache hit rates and help detect abuse.

Example: user-1234
safety_identifierstringOptional

A stable identifier used to help detect users who may violate usage policies. Use a unique per-user string (e.g., a hash of username or email) to avoid sending identifying information.

Example: safety-identifier-1234
prompt_cache_keystringOptional

Used to cache responses for similar requests and improve cache hit rates. Replaces the user field.

Example: prompt-cache-key-1234
service_tierstring · enum · nullableOptional

Specifies the processing tier for the request. The response includes the actual tier used, which may differ from the requested value.

Default: autoPossible values:
prompt_cache_retentionstring · enum · nullableOptional

Retention policy for the prompt cache. Set to 24h to keep cached prefixes active longer (up to 24 hours).

Possible values:
previous_response_idstring · nullableOptional

The unique ID of the previous response to the model. Use this to create multi-turn conversations. Cannot be used with conversation.

modelstringRequired

Model ID used to generate the response (e.g., gpt-4o or o3). See your provider's model guide for available options.

backgroundboolean · nullableOptional

Whether to run the model response in the background.

Default: false
max_output_tokensinteger · nullableOptional

An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.

max_tool_callsinteger · nullableOptional

The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.

tool_choiceany ofRequired

How the model should select which tool (or tools) to use when generating a response. See the tools parameter to see how to specify which tools the model can call.

string · enumOptional

Controls which (if any) tool is called by the model.

none means the model will not call any tool and instead generates a message.

auto means the model can pick between generating a message or calling one or more tools.

required means the model must call one or more tools.

Possible values:
or
or
or
or
or
or
or
truncationstring · enum · nullableOptional

The truncation strategy to use for the model response.

  • auto: If the input to this Response exceeds the model's context window size, the model will truncate the response to fit the context window by dropping items from the beginning of the conversation.
  • disabled (default): If the input size will exceed the context window size for a model, the request will fail with a 400 error.
Default: disabledPossible values:
idstringRequired

Unique identifier for this Response.

objectstring · enumRequired

The object type of this resource - always set to response.

Possible values:
statusstring · enumOptional

The status of the response generation. One of completed, failed, in_progress, cancelled, queued, or incomplete.

Possible values:
created_atnumberRequired

Unix timestamp (in seconds) of when this Response was created.

instructionsany of · nullableRequired

A system (or developer) message inserted into the model's context.

When using along with previous_response_id, the instructions from a previous response will not be carried over to the next response. This makes it simple to swap out system (or developer) messages in new responses.

stringOptional

A text input to the model, equivalent to a text input with the developer role.

or
output_textstring · nullableOptional

SDK-only convenience property that contains the aggregated text output from all output_text items in the output array, if any are present. Supported in the Python and JavaScript SDKs.

parallel_tool_callsbooleanRequired

Whether to allow the model to run tool calls in parallel.

Default: true
post
/v1/responses
200

OK

Get a model response

get

Retrieves a model response with the given ID.

Path parameters
response_idstringRequired

The ID of the response to retrieve.

Example: resp_677efb5139a88190b512bc3fef8e535d
Query parameters
streambooleanOptional

If set to true, the model response data will be streamed to the client as it is generated using server-sent events.

starting_afterintegerOptional

The sequence number of the event after which to start streaming.

include_obfuscationbooleanOptional

When true, stream obfuscation will be enabled. Stream obfuscation adds random characters to an obfuscation field on streaming delta events to normalize payload sizes as a mitigation to certain side-channel attacks. These obfuscation fields are included by default, but add a small amount of overhead to the data stream.

Responses
chevron-right
200

OK

application/json
top_logprobsinteger · max: 20 · nullableOptional

An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability.

temperaturenumber · max: 2 · nullableRequired

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

Default: 1Example: 1
top_pnumber · max: 1 · nullableRequired

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.

Default: 1Example: 1
userstringOptionalDeprecated

Deprecated in favor of safety_identifier and prompt_cache_key; use prompt_cache_key to maintain caching. A stable end-user identifier to improve cache hit rates and help detect abuse.

Example: user-1234
safety_identifierstringOptional

A stable identifier used to help detect users who may violate usage policies. Use a unique per-user string (e.g., a hash of username or email) to avoid sending identifying information.

Example: safety-identifier-1234
prompt_cache_keystringOptional

Used to cache responses for similar requests and improve cache hit rates. Replaces the user field.

Example: prompt-cache-key-1234
service_tierstring · enum · nullableOptional

Specifies the processing tier for the request. The response includes the actual tier used, which may differ from the requested value.

Default: autoPossible values:
prompt_cache_retentionstring · enum · nullableOptional

Retention policy for the prompt cache. Set to 24h to keep cached prefixes active longer (up to 24 hours).

Possible values:
previous_response_idstring · nullableOptional

The unique ID of the previous response to the model. Use this to create multi-turn conversations. Cannot be used with conversation.

modelstringRequired

Model ID used to generate the response (e.g., gpt-4o or o3). See your provider's model guide for available options.

backgroundboolean · nullableOptional

Whether to run the model response in the background.

Default: false
max_output_tokensinteger · nullableOptional

An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.

max_tool_callsinteger · nullableOptional

The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.

tool_choiceany ofRequired

How the model should select which tool (or tools) to use when generating a response. See the tools parameter to see how to specify which tools the model can call.

string · enumOptional

Controls which (if any) tool is called by the model.

none means the model will not call any tool and instead generates a message.

auto means the model can pick between generating a message or calling one or more tools.

required means the model must call one or more tools.

Possible values:
or
or
or
or
or
or
or
truncationstring · enum · nullableOptional

The truncation strategy to use for the model response.

  • auto: If the input to this Response exceeds the model's context window size, the model will truncate the response to fit the context window by dropping items from the beginning of the conversation.
  • disabled (default): If the input size will exceed the context window size for a model, the request will fail with a 400 error.
Default: disabledPossible values:
idstringRequired

Unique identifier for this Response.

objectstring · enumRequired

The object type of this resource - always set to response.

Possible values:
statusstring · enumOptional

The status of the response generation. One of completed, failed, in_progress, cancelled, queued, or incomplete.

Possible values:
created_atnumberRequired

Unix timestamp (in seconds) of when this Response was created.

instructionsany of · nullableRequired

A system (or developer) message inserted into the model's context.

When using along with previous_response_id, the instructions from a previous response will not be carried over to the next response. This makes it simple to swap out system (or developer) messages in new responses.

stringOptional

A text input to the model, equivalent to a text input with the developer role.

or
output_textstring · nullableOptional

SDK-only convenience property that contains the aggregated text output from all output_text items in the output array, if any are present. Supported in the Python and JavaScript SDKs.

parallel_tool_callsbooleanRequired

Whether to allow the model to run tool calls in parallel.

Default: true
get
/v1/responses/{response_id}
200

OK

Delete a model response

delete

Deletes a model response with the given ID.

Path parameters
response_idstringRequired

The ID of the response to delete.

Example: resp_677efb5139a88190b512bc3fef8e535d
Responses
chevron-right
200

OK

No content

delete
/v1/responses/{response_id}

No content

List input items

get

Returns a list of input items for a given response.

Path parameters
response_idstringRequired

The ID of the response to retrieve input items for.

Query parameters
limitintegerOptional

A limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.

Default: 20
orderstring · enumOptional

The order to return the input items in. Default is desc.

  • asc: Return the input items in ascending order.
  • desc: Return the input items in descending order.
Possible values:
afterstringOptional

An item ID to list items after, used in pagination.

Responses
chevron-right
200

OK

application/json

A list of Response items.

objectconst: listRequired

The type of object returned, must be list.

has_morebooleanRequired

Whether there are more items available.

first_idstringRequired

The ID of the first item in the list.

last_idstringRequired

The ID of the last item in the list.

get
/v1/responses/{response_id}/input_items
200

OK

Cancel a response

post

Cancels a model response with the given ID. Only responses created with the background parameter set to true can be cancelled.

Path parameters
response_idstringRequired

The ID of the response to cancel.

Example: resp_677efb5139a88190b512bc3fef8e535d
Responses
chevron-right
200

OK

application/json
top_logprobsinteger · max: 20 · nullableOptional

An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability.

temperaturenumber · max: 2 · nullableRequired

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

Default: 1Example: 1
top_pnumber · max: 1 · nullableRequired

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.

Default: 1Example: 1
userstringOptionalDeprecated

Deprecated in favor of safety_identifier and prompt_cache_key; use prompt_cache_key to maintain caching. A stable end-user identifier to improve cache hit rates and help detect abuse.

Example: user-1234
safety_identifierstringOptional

A stable identifier used to help detect users who may violate usage policies. Use a unique per-user string (e.g., a hash of username or email) to avoid sending identifying information.

Example: safety-identifier-1234
prompt_cache_keystringOptional

Used to cache responses for similar requests and improve cache hit rates. Replaces the user field.

Example: prompt-cache-key-1234
service_tierstring · enum · nullableOptional

Specifies the processing tier for the request. The response includes the actual tier used, which may differ from the requested value.

Default: autoPossible values:
prompt_cache_retentionstring · enum · nullableOptional

Retention policy for the prompt cache. Set to 24h to keep cached prefixes active longer (up to 24 hours).

Possible values:
previous_response_idstring · nullableOptional

The unique ID of the previous response to the model. Use this to create multi-turn conversations. Cannot be used with conversation.

modelstringRequired

Model ID used to generate the response (e.g., gpt-4o or o3). See your provider's model guide for available options.

backgroundboolean · nullableOptional

Whether to run the model response in the background.

Default: false
max_output_tokensinteger · nullableOptional

An upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.

max_tool_callsinteger · nullableOptional

The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.

tool_choiceany ofRequired

How the model should select which tool (or tools) to use when generating a response. See the tools parameter to see how to specify which tools the model can call.

string · enumOptional

Controls which (if any) tool is called by the model.

none means the model will not call any tool and instead generates a message.

auto means the model can pick between generating a message or calling one or more tools.

required means the model must call one or more tools.

Possible values:
or
or
or
or
or
or
or
truncationstring · enum · nullableOptional

The truncation strategy to use for the model response.

  • auto: If the input to this Response exceeds the model's context window size, the model will truncate the response to fit the context window by dropping items from the beginning of the conversation.
  • disabled (default): If the input size will exceed the context window size for a model, the request will fail with a 400 error.
Default: disabledPossible values:
idstringRequired

Unique identifier for this Response.

objectstring · enumRequired

The object type of this resource - always set to response.

Possible values:
statusstring · enumOptional

The status of the response generation. One of completed, failed, in_progress, cancelled, queued, or incomplete.

Possible values:
created_atnumberRequired

Unix timestamp (in seconds) of when this Response was created.

instructionsany of · nullableRequired

A system (or developer) message inserted into the model's context.

When using along with previous_response_id, the instructions from a previous response will not be carried over to the next response. This makes it simple to swap out system (or developer) messages in new responses.

stringOptional

A text input to the model, equivalent to a text input with the developer role.

or
output_textstring · nullableOptional

SDK-only convenience property that contains the aggregated text output from all output_text items in the output array, if any are present. Supported in the Python and JavaScript SDKs.

parallel_tool_callsbooleanRequired

Whether to allow the model to run tool calls in parallel.

Default: true
post
/v1/responses/{response_id}/cancel

Compact a response

post

Compact conversation

Body
modelstringOptional

Model ID used to generate the response, like gpt-5 or o3.

inputany of · nullableOptional

Text, image, or file inputs to the model, used to generate a response

string · max: 10485760Optional

A text input to the model, equivalent to a text input with the user role.

or
previous_response_idstring · nullableOptional

The unique ID of the previous response to the model. Use this to create multi-turn conversations.

Example: resp_123
instructionsstring · nullableOptional

A system (or developer) message inserted into the model's context. When used along with previous_response_id, the instructions from a previous response will not be carried over to the next response. This makes it simple to swap out system (or developer) messages in new responses.

Responses
chevron-right
200

Success

application/json
idstringRequired

The unique identifier for the compacted response.

objectstring · enumRequired

The object type. Always response.compaction.

Default: response.compactionPossible values:
created_atintegerRequired

Unix timestamp (in seconds) when the compacted conversation was created.

post
/v1/responses/compact
200

Success

Get input token counts

post

Get input token counts

Body
modelstring · nullableOptional

Model ID used to generate the response, like gpt-4o or o3.

inputany of · nullableOptional

Text, image, or file inputs to the model, used to generate a response

string · max: 10485760Optional

A text input to the model, equivalent to a text input with the user role.

or
previous_response_idstring · nullableOptional

The unique ID of the previous response to the model. Use this to create multi-turn conversations.

Example: resp_123
truncationstring · enumOptional

The truncation strategy to use for the model response. - auto: If the input to this Response exceeds the model's context window size, the model will truncate the response to fit the context window by dropping items from the beginning of the conversation. - disabled (default): If the input size will exceed the context window size for a model, the request will fail with a 400 error.

Possible values:
instructionsstring · nullableOptional

A system (or developer) message inserted into the model's context. When used along with previous_response_id, the instructions from a previous response will not be carried over to the next response. This makes it simple to swap out system (or developer) messages in new responses.

conversationany of · nullableOptional

The conversation that this response belongs to. Items from this conversation are prepended to input_items for this response request. Input items and output items from this response are automatically added to this conversation after this response completes.

stringOptional

The unique ID of the conversation.

or
tool_choiceany of · nullableOptional

How the model should select which tool (or tools) to use when generating a response. See the tools parameter to see how to specify which tools the model can call.

string · enumOptional

Controls which (if any) tool is called by the model.

none means the model will not call any tool and instead generates a message.

auto means the model can pick between generating a message or calling one or more tools.

required means the model must call one or more tools.

Possible values:
or
or
or
or
or
or
or
parallel_tool_callsboolean · nullableOptional

Whether to allow the model to run tool calls in parallel.

Responses
chevron-right
200

Success

application/json
objectstring · enumRequiredDefault: response.input_tokensPossible values:
input_tokensintegerRequired
post
/v1/responses/input_tokens
200

Success

Last updated