Gateway API
Get an API Key
Chat
To use or not fallbacks in chat completion request
Model name, UUID, or agent UUID.
6948fe4d-98ce-4f36-bc49-5f652cc07b65Whether or not to store the output of this chat completion request
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
0Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used.
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
This value is now deprecated in favor of max_completion_tokens, and is not compatible with o1 series models.
An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep n as 1 to minimize costs.
1Example: 1Configuration for a Predicted Output, which can greatly improve response times when large parts of the model response are known ahead of time. This is most common when you are regenerating a file with only minor changes to most of the content.
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
0An object specifying the format that the model must output. Compatible with GPT-4o, GPT-4o mini, GPT-4 Turbo and all GPT-3.5 Turbo models newer than gpt-3.5-turbo-1106.
Setting to { "type": "json_schema", "json_schema": {...} } enables Structured Outputs which ensures the model will match your supplied JSON schema.
Setting to { "type": "json_object" } enables JSON mode, which ensures the message the model generates is valid JSON.
Important: when using JSON mode, you must also instruct the model to produce JSON yourself via a system or user message. Without this, the model may generate an unending stream of whitespace until the generation reaches the token limit, resulting in a long-running and seemingly "stuck" request. Also note that the message content may be partially cut off if finish_reason="length", which indicates the generation exceeded max_tokens or the conversation exceeded the max context length.
This feature is in Beta.
If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.
Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
Specifies the processing type used for serving the request.
If set to 'default' or 'auto', then the request will be processed with the standard pricing and performance for the selected model.
If set to 'flex' or 'priority', then the request will be processed with the corresponding service tier.
When not set, the default behavior is 'auto'.
When the service_tier parameter is set, the response body will include the service_tier value based on the processing mode actually used to serve the request. This response value may be different from the value set in the parameter.
autoPossible values: Up to 4 sequences where the API will stop generating further tokens.
If set, partial message deltas will be sent, like in ChatGPT. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
We generally recommend altering this or top_p but not both.
1Example: 1An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
We generally recommend altering this or temperature but not both.
1Example: 1Controls which (if any) tool is called by the model.
none means the model will not call any tool and instead generates a message.
auto means the model can pick between generating a message or calling one or more tools.
required means the model must call one or more tools.
Specifying a particular tool via {"type": "function", "function": {"name": "my_function"}} forces the model to call that tool.
none is the default when no tools are present. auto is the default if tools are present.
none means the model will not call any tool and instead generates a message. auto means the model can pick between generating a message or calling one or more tools. required means the model must call one or more tools.
Whether to enable parallel function calling during tool use.
trueDeprecated in favor of tool_choice.
Controls which (if any) function is called by the model.
none means the model will not call a function and instead generates a message.
auto means the model can pick between generating a message or calling a function.
Specifying a particular function via {"name": "my_function"} forces the model to call that function.
none is the default when no functions are present. auto is the default if functions are present.
none means the model will not call a function and instead generates a message. auto means the model can pick between generating a message or calling a function.
Reasoning effort for models that support reasoning.
Constrains the verbosity of the model's response.
Represents a chat completion response returned by model, based on the provided input.
Invalid request data.
Server error.
Audio
6948fe4d-98ce-4f36-bc49-5f652cc07b65The text to generate audio for.
The voice to use when generating the audio.
The format to output audio in.
mp3Possible values: The speed of the generated audio.
1Successful response with an audio speech.
Invalid request data.
Server error.
The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
6948fe4d-98ce-4f36-bc49-5f652cc07b65The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.
An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
The format of the transcript output.
jsonPossible values: The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.
0OK
Invalid request data.
Server error.
The audio file object (not file name) translate, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
6948fe4d-98ce-4f36-bc49-5f652cc07b65An optional text to guide the model's style or continue a previous audio segment. The prompt should be in English.
The format of the translated transcript output.
jsonPossible values: The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.
0OK
Invalid request data.
Server error.
Images
A text description of the desired image(s).
6948fe4d-98ce-4f36-bc49-5f652cc07b65The number of images to generate. Must be between 1 and 10.
1The quality of the generated images.
standardPossible values: The format of the generated images.
urlPossible values: The size of the generated images.
The style of the generated images.
vividPossible values: Represents an image response returned by model, based on the provided input.
The Unix timestamp (in seconds) of when the images were created.
Invalid request data.
Server error.
Embeddings
6948fe4d-98ce-4f36-bc49-5f652cc07b65Input text to get embeddings for.
The food was delicious and the waiter...["The food was delicious","The waiter was friendly"][1,2,3,4,5]The format to return the embeddings in. Can be either float or base64.
floatPossible values: The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.
1536Represents an embedding response returned by model, based on the provided input.
The object type, which is always "list"
listThe model used for generating embeddings
Invalid request data.
Server error.
Storage
A cursor for use in pagination. after is an object ID that defines your place in the list. For instance, if you make a list request and receive 100 objects, ending with obj_foo, your subsequent call can include after=obj_foo in order to fetch the next page of the list.
A limit on the number of objects to be returned. Limit can range between 1 and 10,000, and the default is 10,000.
10000Sort order by the created_at timestamp of the objects. asc for ascending order and desc for descending order.
descPossible values: The intended purpose for a file.
Successful response with list of files
A list of uploaded files.
Server error.
The File object (not file name) to be uploaded.
The intended purpose for a file.
Successful response with file upload details
The File object represents a document that has been uploaded to OpenAI.
The size of the file, in bytes.
The Unix timestamp (in seconds) for when the file was created.
The name of the file.
The file identifier, which can be referenced in the API endpoints.
The intended purpose of the file.
The status of the file.
Invalid request data.
Server error.
The ID of the file to use for this request.
Successful response with file details
The File object represents a document that has been uploaded to OpenAI.
The size of the file, in bytes.
The Unix timestamp (in seconds) for when the file was created.
The name of the file.
The file identifier, which can be referenced in the API endpoints.
The intended purpose of the file.
The status of the file.
File not found.
Server error.
Successful response with deletion status.
The status of a file deletion.
The file identifier.
Indicates whether the file has been deleted.
File not found.
Server error.
Models
The ID of the model.
Model fallbacks configuration found.
Unique identifier for the model
123e4567-e89b-12d3-a456-426614174000The ID of the model to associate with the company.
987f6543-d21c-45e7-b678-123456789abcThe time the model fallbacks configuration was created.
2024-11-19T10:00:00ZThe time the model fallbacks configuration was last updated.
2024-11-19T12:00:00ZHouston, we have a problem
The ID of the model.
The ID of the model to associate with the company.
987f6543-d21c-45e7-b678-123456789abcModel fallbacks configuration successfully created.
Invalid request or fallback configurations already associated with the model.
Model not found.
The ID of the model.
The ID of the model to associate with the company.
987f6543-d21c-45e7-b678-123456789abcModel fallbacks configuration successfully updated.
Unique identifier for the model
123e4567-e89b-12d3-a456-426614174000The ID of the model to associate with the company.
987f6543-d21c-45e7-b678-123456789abcThe time the model fallbacks configuration was created.
2024-11-19T10:00:00ZThe time the model fallbacks configuration was last updated.
2024-11-19T12:00:00ZHouston, we have a problem
Team Management
The ID of the team.
List of team API keys
API Key for use with completions API
nexos-3gfdtu46uiotry456iupo3240v22Unique identifier for the API key
123e4567-e89b-12d3-a456-426614174000Display name for the API key
Production API KeyThe time the key was created.
2024-11-19T10:00:00ZThe time the key was last updated (only name can be updated).
2024-11-19T12:00:00ZList of team API keys
The ID of the team.
The ID of the key.
The name of the key (needs to be unique).
My API KeyTeam API Key
API Key for use with completions API
nexos-3gfdtu46uiotry456iupo3240v22Unique identifier for the API key
123e4567-e89b-12d3-a456-426614174000Display name for the API key
Production API KeyThe time the key was created.
2024-11-19T10:00:00ZThe time the key was last updated (only name can be updated).
2024-11-19T12:00:00ZInvalid request.
Team/Key not found.
Rotated API Key
API Key for use with completions API
nexos-3gfdtu46uiotry456iupo3240v22Unique identifier for the API key
123e4567-e89b-12d3-a456-426614174000Display name for the API key
Production API KeyThe time the key was created.
2024-11-19T10:00:00ZThe time the key was last updated (only name can be updated).
2024-11-19T12:00:00ZUnauthorized — request not authenticated via API key.
Internal server error.
The ID of the team.
The ID of the key.
Team API Key
API Key for use with completions API
nexos-3gfdtu46uiotry456iupo3240v22Unique identifier for the API key
123e4567-e89b-12d3-a456-426614174000Display name for the API key
Production API KeyThe time the key was created.
2024-11-19T10:00:00ZThe time the key was last updated (only name can be updated).
2024-11-19T12:00:00ZInvalid request.
Team/Key not found.
User Management
List of user API keys
API Key for use with completions API
nexos-3gfdtu46uiotry456iupo3240v22Unique identifier for the API key
123e4567-e89b-12d3-a456-426614174000Display name for the API key
Production API KeyThe time the key was created.
2024-11-19T10:00:00ZThe time the key was last updated (only name can be updated).
2024-11-19T12:00:00ZList of user API keys
The ID of the key.
The name of the key (needs to be unique).
My API KeyUser API Key
API Key for use with completions API
nexos-3gfdtu46uiotry456iupo3240v22Unique identifier for the API key
123e4567-e89b-12d3-a456-426614174000Display name for the API key
Production API KeyThe time the key was created.
2024-11-19T10:00:00ZThe time the key was last updated (only name can be updated).
2024-11-19T12:00:00ZInvalid request.
User/Key not found.
Rotated API Key
API Key for use with completions API
nexos-3gfdtu46uiotry456iupo3240v22Unique identifier for the API key
123e4567-e89b-12d3-a456-426614174000Display name for the API key
Production API KeyThe time the key was created.
2024-11-19T10:00:00ZThe time the key was last updated (only name can be updated).
2024-11-19T12:00:00ZUnauthorized — request not authenticated via API key.
Internal server error.
The ID of the key.
User API Key
API Key for use with completions API
nexos-3gfdtu46uiotry456iupo3240v22Unique identifier for the API key
123e4567-e89b-12d3-a456-426614174000Display name for the API key
Production API KeyThe time the key was created.
2024-11-19T10:00:00ZThe time the key was last updated (only name can be updated).
2024-11-19T12:00:00ZInvalid request.
User/Key not found.
Assistant Management
The number of items to return.
100The number of items to skip.
Field to sort by, must be used together with sort_by.order.
Sort order, must be used together with sort_by.field.
Whether to include assistants shared with the user.
falseA list of assistants.
1Assistant not found.
Responses
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability.
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
We generally recommend altering this or top_p but not both.
1Example: 1An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
We generally recommend altering this or temperature but not both.
1Example: 1Deprecated in favor of safety_identifier and prompt_cache_key; use prompt_cache_key to maintain caching. A stable end-user identifier to improve cache hit rates and help detect abuse.
user-1234A stable identifier used to help detect users who may violate usage policies. Use a unique per-user string (e.g., a hash of username or email) to avoid sending identifying information.
safety-identifier-1234Used to cache responses for similar requests and improve cache hit rates. Replaces the user field.
prompt-cache-key-1234Specifies the processing tier for the request. The response includes the actual tier used, which may differ from the requested value.
autoPossible values: Retention policy for the prompt cache. Set to 24h to keep cached prefixes active longer (up to 24 hours).
The unique ID of the previous response to the model. Use this to create multi-turn conversations. Cannot be used with conversation.
Model ID used to generate the response (e.g., gpt-4o or o3). See your provider's model guide for available options.
Whether to run the model response in the background.
falseAn upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.
The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.
How the model should select which tool (or tools) to use when generating
a response. See the tools parameter to see how to specify which tools
the model can call.
Controls which (if any) tool is called by the model.
none means the model will not call any tool and instead generates a message.
auto means the model can pick between generating a message or calling one or
more tools.
required means the model must call one or more tools.
The truncation strategy to use for the model response.
auto: If the input to this Response exceeds the model's context window size, the model will truncate the response to fit the context window by dropping items from the beginning of the conversation.disabled(default): If the input size will exceed the context window size for a model, the request will fail with a 400 error.
disabledPossible values: Text, image, or file inputs used to generate a response. Use this to provide content the model should consider.
A text input to the model, equivalent to a text input with the
user role.
Whether to allow the model to run tool calls in parallel.
trueWhether to store the generated model response for later retrieval via API.
trueA system (or developer) message inserted into the model's context.
When using along with previous_response_id, the instructions from a previous
response will not be carried over to the next response. This makes it simple
to swap out system (or developer) messages in new responses.
If set to true, the model response data will be streamed to the client as it is generated using server-sent events.
falseThe conversation that this response belongs to. Items from this conversation are prepended to input_items for this response request.
Input items and output items from this response are automatically added to this conversation after this response completes.
The unique ID of the conversation.
OK
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability.
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
We generally recommend altering this or top_p but not both.
1Example: 1An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
We generally recommend altering this or temperature but not both.
1Example: 1Deprecated in favor of safety_identifier and prompt_cache_key; use prompt_cache_key to maintain caching. A stable end-user identifier to improve cache hit rates and help detect abuse.
user-1234A stable identifier used to help detect users who may violate usage policies. Use a unique per-user string (e.g., a hash of username or email) to avoid sending identifying information.
safety-identifier-1234Used to cache responses for similar requests and improve cache hit rates. Replaces the user field.
prompt-cache-key-1234Specifies the processing tier for the request. The response includes the actual tier used, which may differ from the requested value.
autoPossible values: Retention policy for the prompt cache. Set to 24h to keep cached prefixes active longer (up to 24 hours).
The unique ID of the previous response to the model. Use this to create multi-turn conversations. Cannot be used with conversation.
Model ID used to generate the response (e.g., gpt-4o or o3). See your provider's model guide for available options.
Whether to run the model response in the background.
falseAn upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.
The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.
How the model should select which tool (or tools) to use when generating
a response. See the tools parameter to see how to specify which tools
the model can call.
Controls which (if any) tool is called by the model.
none means the model will not call any tool and instead generates a message.
auto means the model can pick between generating a message or calling one or
more tools.
required means the model must call one or more tools.
The truncation strategy to use for the model response.
auto: If the input to this Response exceeds the model's context window size, the model will truncate the response to fit the context window by dropping items from the beginning of the conversation.disabled(default): If the input size will exceed the context window size for a model, the request will fail with a 400 error.
disabledPossible values: Unique identifier for this Response.
The object type of this resource - always set to response.
The status of the response generation. One of completed, failed,
in_progress, cancelled, queued, or incomplete.
Unix timestamp (in seconds) of when this Response was created.
A system (or developer) message inserted into the model's context.
When using along with previous_response_id, the instructions from a previous
response will not be carried over to the next response. This makes it simple
to swap out system (or developer) messages in new responses.
A text input to the model, equivalent to a text input with the
developer role.
SDK-only convenience property that contains the aggregated text output
from all output_text items in the output array, if any are present.
Supported in the Python and JavaScript SDKs.
Whether to allow the model to run tool calls in parallel.
trueOK
The ID of the response to retrieve.
resp_677efb5139a88190b512bc3fef8e535dIf set to true, the model response data will be streamed to the client as it is generated using server-sent events.
The sequence number of the event after which to start streaming.
When true, stream obfuscation will be enabled. Stream obfuscation adds
random characters to an obfuscation field on streaming delta events
to normalize payload sizes as a mitigation to certain side-channel
attacks. These obfuscation fields are included by default, but add a
small amount of overhead to the data stream.
OK
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability.
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
We generally recommend altering this or top_p but not both.
1Example: 1An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
We generally recommend altering this or temperature but not both.
1Example: 1Deprecated in favor of safety_identifier and prompt_cache_key; use prompt_cache_key to maintain caching. A stable end-user identifier to improve cache hit rates and help detect abuse.
user-1234A stable identifier used to help detect users who may violate usage policies. Use a unique per-user string (e.g., a hash of username or email) to avoid sending identifying information.
safety-identifier-1234Used to cache responses for similar requests and improve cache hit rates. Replaces the user field.
prompt-cache-key-1234Specifies the processing tier for the request. The response includes the actual tier used, which may differ from the requested value.
autoPossible values: Retention policy for the prompt cache. Set to 24h to keep cached prefixes active longer (up to 24 hours).
The unique ID of the previous response to the model. Use this to create multi-turn conversations. Cannot be used with conversation.
Model ID used to generate the response (e.g., gpt-4o or o3). See your provider's model guide for available options.
Whether to run the model response in the background.
falseAn upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.
The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.
How the model should select which tool (or tools) to use when generating
a response. See the tools parameter to see how to specify which tools
the model can call.
Controls which (if any) tool is called by the model.
none means the model will not call any tool and instead generates a message.
auto means the model can pick between generating a message or calling one or
more tools.
required means the model must call one or more tools.
The truncation strategy to use for the model response.
auto: If the input to this Response exceeds the model's context window size, the model will truncate the response to fit the context window by dropping items from the beginning of the conversation.disabled(default): If the input size will exceed the context window size for a model, the request will fail with a 400 error.
disabledPossible values: Unique identifier for this Response.
The object type of this resource - always set to response.
The status of the response generation. One of completed, failed,
in_progress, cancelled, queued, or incomplete.
Unix timestamp (in seconds) of when this Response was created.
A system (or developer) message inserted into the model's context.
When using along with previous_response_id, the instructions from a previous
response will not be carried over to the next response. This makes it simple
to swap out system (or developer) messages in new responses.
A text input to the model, equivalent to a text input with the
developer role.
SDK-only convenience property that contains the aggregated text output
from all output_text items in the output array, if any are present.
Supported in the Python and JavaScript SDKs.
Whether to allow the model to run tool calls in parallel.
trueOK
The ID of the response to retrieve input items for.
A limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.
20The order to return the input items in. Default is desc.
asc: Return the input items in ascending order.desc: Return the input items in descending order.
An item ID to list items after, used in pagination.
OK
A list of Response items.
The type of object returned, must be list.
Whether there are more items available.
The ID of the first item in the list.
The ID of the last item in the list.
OK
The ID of the response to cancel.
resp_677efb5139a88190b512bc3fef8e535dOK
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability.
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
We generally recommend altering this or top_p but not both.
1Example: 1An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
We generally recommend altering this or temperature but not both.
1Example: 1Deprecated in favor of safety_identifier and prompt_cache_key; use prompt_cache_key to maintain caching. A stable end-user identifier to improve cache hit rates and help detect abuse.
user-1234A stable identifier used to help detect users who may violate usage policies. Use a unique per-user string (e.g., a hash of username or email) to avoid sending identifying information.
safety-identifier-1234Used to cache responses for similar requests and improve cache hit rates. Replaces the user field.
prompt-cache-key-1234Specifies the processing tier for the request. The response includes the actual tier used, which may differ from the requested value.
autoPossible values: Retention policy for the prompt cache. Set to 24h to keep cached prefixes active longer (up to 24 hours).
The unique ID of the previous response to the model. Use this to create multi-turn conversations. Cannot be used with conversation.
Model ID used to generate the response (e.g., gpt-4o or o3). See your provider's model guide for available options.
Whether to run the model response in the background.
falseAn upper bound for the number of tokens that can be generated for a response, including visible output tokens and reasoning tokens.
The maximum number of total calls to built-in tools that can be processed in a response. This maximum number applies across all built-in tool calls, not per individual tool. Any further attempts to call a tool by the model will be ignored.
How the model should select which tool (or tools) to use when generating
a response. See the tools parameter to see how to specify which tools
the model can call.
Controls which (if any) tool is called by the model.
none means the model will not call any tool and instead generates a message.
auto means the model can pick between generating a message or calling one or
more tools.
required means the model must call one or more tools.
The truncation strategy to use for the model response.
auto: If the input to this Response exceeds the model's context window size, the model will truncate the response to fit the context window by dropping items from the beginning of the conversation.disabled(default): If the input size will exceed the context window size for a model, the request will fail with a 400 error.
disabledPossible values: Unique identifier for this Response.
The object type of this resource - always set to response.
The status of the response generation. One of completed, failed,
in_progress, cancelled, queued, or incomplete.
Unix timestamp (in seconds) of when this Response was created.
A system (or developer) message inserted into the model's context.
When using along with previous_response_id, the instructions from a previous
response will not be carried over to the next response. This makes it simple
to swap out system (or developer) messages in new responses.
A text input to the model, equivalent to a text input with the
developer role.
SDK-only convenience property that contains the aggregated text output
from all output_text items in the output array, if any are present.
Supported in the Python and JavaScript SDKs.
Whether to allow the model to run tool calls in parallel.
trueNot Found
Model ID used to generate the response, like gpt-5 or o3.
Text, image, or file inputs to the model, used to generate a response
A text input to the model, equivalent to a text input with the user role.
The unique ID of the previous response to the model. Use this to create multi-turn conversations.
resp_123A system (or developer) message inserted into the model's context.
When used along with previous_response_id, the instructions from a previous response will not be carried over to the next response. This makes it simple to swap out system (or developer) messages in new responses.
Success
The unique identifier for the compacted response.
The object type. Always response.compaction.
response.compactionPossible values: Unix timestamp (in seconds) when the compacted conversation was created.
Success
Model ID used to generate the response, like gpt-4o or o3.
Text, image, or file inputs to the model, used to generate a response
A text input to the model, equivalent to a text input with the user role.
The unique ID of the previous response to the model. Use this to create multi-turn conversations.
resp_123The truncation strategy to use for the model response. - auto: If the input to this Response exceeds the model's context window size, the model will truncate the response to fit the context window by dropping items from the beginning of the conversation. - disabled (default): If the input size will exceed the context window size for a model, the request will fail with a 400 error.
A system (or developer) message inserted into the model's context.
When used along with previous_response_id, the instructions from a previous response will not be carried over to the next response. This makes it simple to swap out system (or developer) messages in new responses.
The conversation that this response belongs to. Items from this conversation are prepended to input_items for this response request.
Input items and output items from this response are automatically added to this conversation after this response completes.
The unique ID of the conversation.
How the model should select which tool (or tools) to use when generating
a response. See the tools parameter to see how to specify which tools
the model can call.
Controls which (if any) tool is called by the model.
none means the model will not call any tool and instead generates a message.
auto means the model can pick between generating a message or calling one or
more tools.
required means the model must call one or more tools.
Whether to allow the model to run tool calls in parallel.
Success
response.input_tokensPossible values: Success
Last updated

