AI Router

The Chalk AI Router is a managed, OpenAI-compatible LLM gateway hosted alongside your Chalk deployment. Point any OpenAI-compatible client at the router and it forwards your request to the right provider — OpenAI, Anthropic, or Google Gemini — behind a single API. Because the router sits in front of every provider, it is also where you centralize the things you do not want scattered across application code: API keys and budgets, rate limits, automatic fallback between models, and provider credentials.

The router is hosted with the Chalk API and requires no installation or self-hosting.

Endpoint

POST

https://api.chalk.ai/v1/router

The router exposes the standard OpenAI-compatible paths under this base URL:

Path	Purpose
`/chat/completions`	Chat completions (streams via Server-Sent Events).
`/embeddings`	Text embeddings.
`/images/generations`	Image generation.
`/models`	List the models available to your key.

If you are on a dedicated or self-hosted Chalk deployment, replace api.chalk.ai with your own API server host. You can find it in the apiServer field of chalk config --format json. Requests are routed to the calling environment’s router, selected by the X-Chalk-Env-Id header described below.

Authentication

Every request carries an issued router API key as a bearer token, plus the environment to route to:

Header	Value
`Authorization`	`Bearer <ROUTER_API_KEY>`
`X-Chalk-Env-Id`	The environment to route the request to

Router API keys are issued and revoked from the dashboard — see API keys below. A key is shown only once when it is issued, so copy it immediately.

Router API keys are sensitive: they spend against your configured providers. Treat them like any other secret, scope them with a model allow-list and daily budget, and revoke any key you suspect is compromised.

Using the router

The router speaks the OpenAI API, so any OpenAI-compatible client works with no code changes beyond the base URL, key, and environment header.

from openai import OpenAI

client = OpenAI(
    base_url="https://<your-chalk-api-host>/v1/router",
    api_key="<YOUR_ROUTER_API_KEY>",
    default_headers={"X-Chalk-Env-Id": "<env-id>"},
)

resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)

The same request with curl:

curl https://<your-chalk-api-host>/v1/router/chat/completions \
  -H "Authorization: Bearer $CHALK_ROUTER_API_KEY" \
  -H "X-Chalk-Env-Id: <env-id>" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'

The model you request is resolved against the providers you have configured. A key’s provider restriction, model allow-list, team, and daily token budget are all enforced on every request.

Calling the router from Chalk

Because the router is OpenAI-compatible, you can also call it from inside feature computation with chalk.functions. F.openai_complete issues a chat completion while a query runs. To route it through the AI Router rather than calling OpenAI directly, set api_server to your Chalk API host — or set the OPENAI_BASE_URL environment variable on the execution host and omit the argument.

import chalk.functions as F
from chalk.features import _
from chalkdf import DataFrame

df = DataFrame({"questions": ["Recommend some movies like High and Low by Akira Kurosawa"]})
(
    df.with_columns(
        F.openai_complete(
            prompt=_.x,
            model="o4-mini",
            service_tier="flex",
            api_server="https://api.chalk.ai",  # route through the AI Router
        ).alias("result")
    )
    .run()
    .to_arrow()
)

F.openai_complete takes the following arguments:

Argument	Description
`prompt`	The prompt text to send to the model.
`model`	The model to use. Defaults to `gpt-3.5-turbo` when omitted.
`api_server`	Base URL of an OpenAI-compatible endpoint; `/chat/completions` is appended. Falls back to the `OPENAI_BASE_URL` env var, then to the default OpenAI endpoint. Point it at your Chalk API host to use the AI Router.
`api_key`	API key for authentication. Falls back to the `OPENAI_API_KEY` env var, so the secret does not have to be threaded through feature data.
`max_tokens`	Maximum number of tokens to generate.
`temperature`	Sampling temperature between 0 and 2.
`service_tier`	Optional OpenAI service tier — `"flex"` (cheaper, higher-latency), `"priority"`, or `"auto"`. `"flex"` is only supported on reasoning models (`o3`, `o4-mini`, `gpt-5`-class) and uses a longer request timeout; passing it with an unsupported model returns `null`.

It returns a struct with the completion text plus prompt_tokens, completion_tokens, total_tokens, model, finish_reason, and the upstream ratelimit_remaining_tokens / ratelimit_remaining_requests headers.

Throttling with rate limits

LLM calls are blocking and metered, so throttle them with the policy modifiers chained onto the expression. with_rate_limit caps how often the call may run across every expression that shares its key:

import chalk.functions as F
from chalk.features import _
from chalkdf import DataFrame

df = DataFrame({"x": ["Recommend some movies like Buzzard by Joel Potrykus"]})
(
    df.with_columns(
        F.openai_complete(prompt=_.x, model="o4-mini", service_tier="flex")
        .with_rate_limit(rate=3, key="openai", per="minute")
        .alias("result")
    )
    .run()
    .to_arrow()
)

The same modifier works on a feature defined in a @features class — chain .with_rate_limit(...) before selecting the .completion field.

with_rate_limit takes:

Argument	Description
`rate`	Number of calls allowed per window.
`per`	Window length — `"second"` (default), `"minute"`, or `"hour"`.
`key`	Bucket name; all expressions sharing a `key` draw from the same budget.
`enforce_globally`	When `True`, enforce the limit across all workers rather than per-worker. Defaults to `False`.

These policy modifiers compose — chain with_concurrency, with_rate_limit, and with_retry on the same expression to bound in-flight calls, cap the call rate, and retry transient failures with backoff:

(
    F.openai_complete(prompt=_.x, model="o4-mini", service_tier="flex")
    .with_concurrency(max_concurrent=4, key="my_api")
    .with_rate_limit(rate=100, key="my_api")
    .with_retry(max_retries=3, key="my_api")
)

Reusing one key ties the policies to the same logical resource, so every expression that calls "my_api" shares a single budget — here, at most 4 concurrent calls and 100 calls per second (per defaults to "second"), with each failed call retried up to 3 times.

API keys

Issue and revoke router keys from the AI Router → API keys tab of the Chalk dashboard. Each key can be scoped at issue time so that a leaked or over-eager client cannot do more than you intend:

Setting	Effect
Description	A human-readable label for the key.
Team	Associates the key with a team, so team-scoped rate limits apply.
Provider	Restricts the key to a single provider (for example, `openai`).
Model allow-list	Restricts the key to specific models (for example, `gpt-4o`, `gpt-4o-mini`).
Daily token budget	Caps the tokens the key may consume per day.
Labels	Custom key/value metadata.
Cost tags	Tags used to attribute spend for billing and reporting.

Each key tracks its total token usage, and revoking a key takes effect immediately.

Teams

Teams group API keys so you can manage and limit access by team rather than key by key. Create a team, then assign keys to it at issue time. Rate limits can be scoped to a team so the whole team shares a single budget. Deleting a team leaves its keys working but removes their team association.

Rate limits

Rate limit policies cap throughput and protect against runaway spend. Each policy has a limit type, an optional target, and a scope:

Limit type — tokens per minute, requests per minute, or concurrent requests.
Target (optional) — narrow the policy to a specific provider or model. Leave it unset to apply across all traffic.
Scope — apply the limit per individual key (token) or shared across a team.

Policies can be enabled or disabled without deleting them.

Fallback policies

A fallback policy keeps requests succeeding when a primary model is unavailable. For a primary model you define an ordered list of fallbacks; if a request against the primary fails, the router retries the fallbacks in order. For example:

gpt-4o → [gpt-4o-mini, claude-3-5-sonnet]

If gpt-4o is unavailable, the router tries gpt-4o-mini, then claude-3-5-sonnet, before giving up. Edit fallback rules in the AI Router → Fallback tab.

Providers

Configure provider credentials in the AI Router → Settings tab. The router supports:

Provider	ID
OpenAI	`openai`
Anthropic	`anthropic`
Google Gemini	`gemini`

For each provider you set an API key and, optionally, a base URL override to point at a custom or OpenAI-compatible endpoint (for example, Gemini’s OpenAI-compatible API). Credentials are applied at runtime — no redeploy is required — and each provider shows a connected or not-configured status.

Playground

The AI Router → Playground tab is an in-dashboard tester for the router. It authenticates with your Chalk session — no separate API key needed — and lets you exercise the router across three tabs:

Chat — send streaming chat completions with a configurable system prompt, temperature, and max tokens.
Embeddings — generate embeddings and inspect their dimensions and token counts.
Images — generate images with configurable size, quality, and count.

The model picker is populated from the router’s /models endpoint, so it reflects the providers you have configured.

Endpoint

Authentication

Using the router

Calling the router from Chalk

Throttling with rate limits

API keys

Teams

Rate limits

Fallback policies

Providers

Playground

See also

On this page

​Endpoint

​Authentication

​Using the router

​Calling the router from Chalk

​Throttling with rate limits

​API keys

​Teams

​Rate limits

​Fallback policies

​Providers

​Playground

​See also

On this page

Endpoint

Authentication

Using the router

Calling the router from Chalk

Throttling with rate limits

API keys

Teams

Rate limits

Fallback policies

Providers

Playground

See also