The Chalk AI Router is a managed, OpenAI-compatible LLM gateway hosted alongside your Chalk deployment. Point any OpenAI-compatible client at the router and it forwards your request to the right provider — OpenAI, Anthropic, or Google Gemini — behind a single API. Because the router sits in front of every provider, it is also where you centralize the things you do not want scattered across application code: API keys and budgets, rate limits, automatic fallback between models, and provider credentials.

The router is hosted with the Chalk API and requires no installation or self-hosting.


Endpoint

POST
https://api.chalk.ai/v1/router

The router exposes the standard OpenAI-compatible paths under this base URL:

PathPurpose
/chat/completionsChat completions (streams via Server-Sent Events).
/embeddingsText embeddings.
/images/generationsImage generation.
/modelsList the models available to your key.

If you are on a dedicated or self-hosted Chalk deployment, replace api.chalk.ai with your own API server host. You can find it in the apiServer field of chalk config --format json. Requests are routed to the calling environment’s router, selected by the X-Chalk-Env-Id header described below.


Authentication

Every request carries an issued router API key as a bearer token, plus the environment to route to:

HeaderValue
AuthorizationBearer <ROUTER_API_KEY>
X-Chalk-Env-IdThe environment to route the request to

Router API keys are issued and revoked from the dashboard — see API keys below. A key is shown only once when it is issued, so copy it immediately.

Router API keys are sensitive: they spend against your configured providers. Treat them like any other secret, scope them with a model allow-list and daily budget, and revoke any key you suspect is compromised.


Using the router

The router speaks the OpenAI API, so any OpenAI-compatible client works with no code changes beyond the base URL, key, and environment header.

from openai import OpenAI

client = OpenAI(
    base_url="https://<your-chalk-api-host>/v1/router",
    api_key="<YOUR_ROUTER_API_KEY>",
    default_headers={"X-Chalk-Env-Id": "<env-id>"},
)

resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)

The same request with curl:

curl https://<your-chalk-api-host>/v1/router/chat/completions \
  -H "Authorization: Bearer $CHALK_ROUTER_API_KEY" \
  -H "X-Chalk-Env-Id: <env-id>" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'

The model you request is resolved against the providers you have configured. A key’s provider restriction, model allow-list, team, and daily token budget are all enforced on every request.


Calling the router from Chalk

Because the router is OpenAI-compatible, you can also call it from inside feature computation with chalk.functions. F.openai_complete issues a chat completion while a query runs. To route it through the AI Router rather than calling OpenAI directly, set api_server to your Chalk API host — or set the OPENAI_BASE_URL environment variable on the execution host and omit the argument.

import chalk.functions as F
from chalk.features import _
from chalkdf import DataFrame

df = DataFrame({"questions": ["Recommend some movies like High and Low by Akira Kurosawa"]})
(
    df.with_columns(
        F.openai_complete(
            prompt=_.x,
            model="o4-mini",
            service_tier="flex",
            api_server="https://api.chalk.ai",  # route through the AI Router
        ).alias("result")
    )
    .run()
    .to_arrow()
)

F.openai_complete takes the following arguments:

ArgumentDescription
promptThe prompt text to send to the model.
modelThe model to use. Defaults to gpt-3.5-turbo when omitted.
api_serverBase URL of an OpenAI-compatible endpoint; /chat/completions is appended. Falls back to the OPENAI_BASE_URL env var, then to the default OpenAI endpoint. Point it at your Chalk API host to use the AI Router.
api_keyAPI key for authentication. Falls back to the OPENAI_API_KEY env var, so the secret does not have to be threaded through feature data.
max_tokensMaximum number of tokens to generate.
temperatureSampling temperature between 0 and 2.
service_tierOptional OpenAI service tier — "flex" (cheaper, higher-latency), "priority", or "auto". "flex" is only supported on reasoning models (o3, o4-mini, gpt-5-class) and uses a longer request timeout; passing it with an unsupported model returns null.

It returns a struct with the completion text plus prompt_tokens, completion_tokens, total_tokens, model, finish_reason, and the upstream ratelimit_remaining_tokens / ratelimit_remaining_requests headers.

Throttling with rate limits

LLM calls are blocking and metered, so throttle them with the policy modifiers chained onto the expression. with_rate_limit caps how often the call may run across every expression that shares its key:

import chalk.functions as F
from chalk.features import _
from chalkdf import DataFrame

df = DataFrame({"x": ["Recommend some movies like Buzzard by Joel Potrykus"]})
(
    df.with_columns(
        F.openai_complete(prompt=_.x, model="o4-mini", service_tier="flex")
        .with_rate_limit(rate=3, key="openai", per="minute")
        .alias("result")
    )
    .run()
    .to_arrow()
)

The same modifier works on a feature defined in a @features class — chain .with_rate_limit(...) before selecting the .completion field.

with_rate_limit takes:

ArgumentDescription
rateNumber of calls allowed per window.
perWindow length — "second" (default), "minute", or "hour".
keyBucket name; all expressions sharing a key draw from the same budget.
enforce_globallyWhen True, enforce the limit across all workers rather than per-worker. Defaults to False.

These policy modifiers compose — chain with_concurrency, with_rate_limit, and with_retry on the same expression to bound in-flight calls, cap the call rate, and retry transient failures with backoff:

(
    F.openai_complete(prompt=_.x, model="o4-mini", service_tier="flex")
    .with_concurrency(max_concurrent=4, key="my_api")
    .with_rate_limit(rate=100, key="my_api")
    .with_retry(max_retries=3, key="my_api")
)

Reusing one key ties the policies to the same logical resource, so every expression that calls "my_api" shares a single budget — here, at most 4 concurrent calls and 100 calls per second (per defaults to "second"), with each failed call retried up to 3 times.


API keys

Issue and revoke router keys from the AI Router → API keys tab of the Chalk dashboard. Each key can be scoped at issue time so that a leaked or over-eager client cannot do more than you intend:

SettingEffect
DescriptionA human-readable label for the key.
TeamAssociates the key with a team, so team-scoped rate limits apply.
ProviderRestricts the key to a single provider (for example, openai).
Model allow-listRestricts the key to specific models (for example, gpt-4o, gpt-4o-mini).
Daily token budgetCaps the tokens the key may consume per day.
LabelsCustom key/value metadata.
Cost tagsTags used to attribute spend for billing and reporting.

Each key tracks its total token usage, and revoking a key takes effect immediately.


Teams

Teams group API keys so you can manage and limit access by team rather than key by key. Create a team, then assign keys to it at issue time. Rate limits can be scoped to a team so the whole team shares a single budget. Deleting a team leaves its keys working but removes their team association.


Rate limits

Rate limit policies cap throughput and protect against runaway spend. Each policy has a limit type, an optional target, and a scope:

  • Limit type — tokens per minute, requests per minute, or concurrent requests.
  • Target (optional) — narrow the policy to a specific provider or model. Leave it unset to apply across all traffic.
  • Scope — apply the limit per individual key (token) or shared across a team.

Policies can be enabled or disabled without deleting them.


Fallback policies

A fallback policy keeps requests succeeding when a primary model is unavailable. For a primary model you define an ordered list of fallbacks; if a request against the primary fails, the router retries the fallbacks in order. For example:

gpt-4o → [gpt-4o-mini, claude-3-5-sonnet]

If gpt-4o is unavailable, the router tries gpt-4o-mini, then claude-3-5-sonnet, before giving up. Edit fallback rules in the AI Router → Fallback tab.


Providers

Configure provider credentials in the AI Router → Settings tab. The router supports:

ProviderID
OpenAIopenai
Anthropicanthropic
Google Geminigemini

For each provider you set an API key and, optionally, a base URL override to point at a custom or OpenAI-compatible endpoint (for example, Gemini’s OpenAI-compatible API). Credentials are applied at runtime — no redeploy is required — and each provider shows a connected or not-configured status.


Playground

The AI Router → Playground tab is an in-dashboard tester for the router. It authenticates with your Chalk session — no separate API key needed — and lets you exercise the router across three tabs:

  • Chat — send streaming chat completions with a configurable system prompt, temperature, and max tokens.
  • Embeddings — generate embeddings and inspect their dimensions and token counts.
  • Images — generate images with configurable size, quality, and count.

The model picker is populated from the router’s /models endpoint, so it reflects the providers you have configured.


See also

  • Authentication — service credentials and RBAC for the Chalk API.
  • MCP Gateway — govern the external MCP servers your agents reach.
  • LLM Toolchain — building AI features and agents on Chalk.