Skip to main content

/responses [Beta]

LiteLLM provides a BETA endpoint in the spec of OpenAI's /responses API

FeatureSupportedNotes
Cost Tracking✅Works with all supported models
Logging✅Works across all integrations
End-user Tracking✅
Streaming✅
Fallbacks✅Works between supported models
Loadbalancing✅Works between supported models
Supported LiteLLM Versions1.63.8+
Supported LLM providersAll LiteLLM supported providersopenai, anthropic, bedrock, vertex_ai, gemini, azure, azure_ai etc.

Usage​

LiteLLM Python SDK​

Non-streaming​

OpenAI Non-streaming Response
import litellm

# Non-streaming response
response = litellm.responses(
model="openai/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
max_output_tokens=100
)

print(response)

Streaming​

OpenAI Streaming Response
import litellm

# Streaming response
response = litellm.responses(
model="openai/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
stream=True
)

for event in response:
print(event)

LiteLLM Proxy with OpenAI SDK​

First, set up and start your LiteLLM proxy server.

Start LiteLLM Proxy Server
litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

First, add this to your litellm proxy config.yaml:

OpenAI Proxy Configuration
model_list:
- model_name: openai/o1-pro
litellm_params:
model: openai/o1-pro
api_key: os.environ/OPENAI_API_KEY

Non-streaming​

OpenAI Proxy Non-streaming Response
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
base_url="http://localhost:4000", # Your proxy URL
api_key="your-api-key" # Your proxy API key
)

# Non-streaming response
response = client.responses.create(
model="openai/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn."
)

print(response)

Streaming​

OpenAI Proxy Streaming Response
from openai import OpenAI

# Initialize client with your proxy URL
client = OpenAI(
base_url="http://localhost:4000", # Your proxy URL
api_key="your-api-key" # Your proxy API key
)

# Streaming response
response = client.responses.create(
model="openai/o1-pro",
input="Tell me a three sentence bedtime story about a unicorn.",
stream=True
)

for event in response:
print(event)

Supported Responses API Parameters​

ProviderSupported Parameters
openaiAll Responses API parameters are supported
azureAll Responses API parameters are supported
anthropicSee supported parameters here
bedrockSee supported parameters here
geminiSee supported parameters here
vertex_aiSee supported parameters here
azure_aiSee supported parameters here
All other llm api providersSee supported parameters here