Model Platform
Route LLM traffic to multiple providers through one OpenAI-compatible gateway.
The Chalk AI Router is a managed, OpenAI-compatible LLM gateway hosted alongside your Chalk deployment. Point any OpenAI-compatible client at the router and it forwards your request to the right provider — OpenAI, Anthropic, or Google Gemini — behind a single API. Because the router sits in front of every provider, it is also where you centralize the things you do not want scattered across application code: API keys and budgets, rate limits, automatic fallback between models, and provider credentials.
The router is hosted with the Chalk API and requires no installation or self-hosting.
The router exposes the standard OpenAI-compatible paths under this base URL:
| Path | Purpose |
|---|---|
/chat/completions | Chat completions (streams via Server-Sent Events). |
/embeddings | Text embeddings. |
/images/generations | Image generation. |
/models | List the models available to your key. |
If you are on a dedicated or self-hosted Chalk deployment, replace api.chalk.ai with your own
API server host. You can find it in the apiServer field of chalk config --format json.
Requests are routed to the calling environment’s router, selected by the X-Chalk-Env-Id header
described below.
Every request carries an issued router API key as a bearer token, plus the environment to route to:
| Header | Value |
|---|---|
Authorization | Bearer <ROUTER_API_KEY> |
X-Chalk-Env-Id | The environment to route the request to |
Router API keys are issued and revoked from the dashboard — see API keys below. A key is shown only once when it is issued, so copy it immediately.
Router API keys are sensitive: they spend against your configured providers. Treat them like any other secret, scope them with a model allow-list and daily budget, and revoke any key you suspect is compromised.
The router speaks the OpenAI API, so any OpenAI-compatible client works with no code changes beyond the base URL, key, and environment header.
from openai import OpenAI
client = OpenAI(
base_url="https://<your-chalk-api-host>/v1/router",
api_key="<YOUR_ROUTER_API_KEY>",
default_headers={"X-Chalk-Env-Id": "<env-id>"},
)
resp = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)The same request with curl:
curl https://<your-chalk-api-host>/v1/router/chat/completions \
-H "Authorization: Bearer $CHALK_ROUTER_API_KEY" \
-H "X-Chalk-Env-Id: <env-id>" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'The model you request is resolved against the providers you have configured. A key’s provider
restriction, model allow-list, team, and daily token budget are all enforced on every request.
Because the router is OpenAI-compatible, you can also call it from inside feature computation with
chalk.functions. F.openai_complete issues a chat completion while a query
runs. To route it through the AI Router rather than calling OpenAI directly, set api_server to
your Chalk API host — or set the OPENAI_BASE_URL environment variable on the execution host and
omit the argument.
import chalk.functions as F
from chalk.features import _
from chalkdf import DataFrame
df = DataFrame({"questions": ["Recommend some movies like High and Low by Akira Kurosawa"]})
(
df.with_columns(
F.openai_complete(
prompt=_.x,
model="o4-mini",
service_tier="flex",
api_server="https://api.chalk.ai", # route through the AI Router
).alias("result")
)
.run()
.to_arrow()
)F.openai_complete takes the following arguments:
| Argument | Description |
|---|---|
prompt | The prompt text to send to the model. |
model | The model to use. Defaults to gpt-3.5-turbo when omitted. |
api_server | Base URL of an OpenAI-compatible endpoint; /chat/completions is appended. Falls back to the OPENAI_BASE_URL env var, then to the default OpenAI endpoint. Point it at your Chalk API host to use the AI Router. |
api_key | API key for authentication. Falls back to the OPENAI_API_KEY env var, so the secret does not have to be threaded through feature data. |
max_tokens | Maximum number of tokens to generate. |
temperature | Sampling temperature between 0 and 2. |
service_tier | Optional OpenAI service tier — "flex" (cheaper, higher-latency), "priority", or "auto". "flex" is only supported on reasoning models (o3, o4-mini, gpt-5-class) and uses a longer request timeout; passing it with an unsupported model returns null. |
It returns a struct with the completion text plus prompt_tokens, completion_tokens,
total_tokens, model, finish_reason, and the upstream ratelimit_remaining_tokens /
ratelimit_remaining_requests headers.
LLM calls are blocking and metered, so throttle them with the policy modifiers chained onto the
expression. with_rate_limit caps how often the call may run across every expression that shares
its key:
import chalk.functions as F
from chalk.features import _
from chalkdf import DataFrame
df = DataFrame({"x": ["Recommend some movies like Buzzard by Joel Potrykus"]})
(
df.with_columns(
F.openai_complete(prompt=_.x, model="o4-mini", service_tier="flex")
.with_rate_limit(rate=3, key="openai", per="minute")
.alias("result")
)
.run()
.to_arrow()
)The same modifier works on a feature defined in a @features class — chain .with_rate_limit(...)
before selecting the .completion field.
with_rate_limit takes:
| Argument | Description |
|---|---|
rate | Number of calls allowed per window. |
per | Window length — "second" (default), "minute", or "hour". |
key | Bucket name; all expressions sharing a key draw from the same budget. |
enforce_globally | When True, enforce the limit across all workers rather than per-worker. Defaults to False. |
These policy modifiers compose — chain with_concurrency, with_rate_limit, and with_retry on
the same expression to bound in-flight calls, cap the call rate, and retry transient failures with
backoff:
(
F.openai_complete(prompt=_.x, model="o4-mini", service_tier="flex")
.with_concurrency(max_concurrent=4, key="my_api")
.with_rate_limit(rate=100, key="my_api")
.with_retry(max_retries=3, key="my_api")
)Reusing one key ties the policies to the same logical resource, so every expression that calls
"my_api" shares a single budget — here, at most 4 concurrent calls and 100 calls per second
(per defaults to "second"), with each failed call retried up to 3 times.
Issue and revoke router keys from the AI Router → API keys tab of the Chalk dashboard. Each key can be scoped at issue time so that a leaked or over-eager client cannot do more than you intend:
| Setting | Effect |
|---|---|
| Description | A human-readable label for the key. |
| Team | Associates the key with a team, so team-scoped rate limits apply. |
| Provider | Restricts the key to a single provider (for example, openai). |
| Model allow-list | Restricts the key to specific models (for example, gpt-4o, gpt-4o-mini). |
| Daily token budget | Caps the tokens the key may consume per day. |
| Labels | Custom key/value metadata. |
| Cost tags | Tags used to attribute spend for billing and reporting. |
Each key tracks its total token usage, and revoking a key takes effect immediately.
Teams group API keys so you can manage and limit access by team rather than key by key. Create a team, then assign keys to it at issue time. Rate limits can be scoped to a team so the whole team shares a single budget. Deleting a team leaves its keys working but removes their team association.
Rate limit policies cap throughput and protect against runaway spend. Each policy has a limit type, an optional target, and a scope:
Policies can be enabled or disabled without deleting them.
A fallback policy keeps requests succeeding when a primary model is unavailable. For a primary model you define an ordered list of fallbacks; if a request against the primary fails, the router retries the fallbacks in order. For example:
gpt-4o → [gpt-4o-mini, claude-3-5-sonnet]
If gpt-4o is unavailable, the router tries gpt-4o-mini, then claude-3-5-sonnet, before
giving up. Edit fallback rules in the AI Router → Fallback tab.
Configure provider credentials in the AI Router → Settings tab. The router supports:
| Provider | ID |
|---|---|
| OpenAI | openai |
| Anthropic | anthropic |
| Google Gemini | gemini |
For each provider you set an API key and, optionally, a base URL override to point at a custom or OpenAI-compatible endpoint (for example, Gemini’s OpenAI-compatible API). Credentials are applied at runtime — no redeploy is required — and each provider shows a connected or not-configured status.
The AI Router → Playground tab is an in-dashboard tester for the router. It authenticates with your Chalk session — no separate API key needed — and lets you exercise the router across three tabs:
The model picker is populated from the router’s /models endpoint, so it reflects the providers
you have configured.