# AI Router
source: https://docs.chalk.ai/docs/ai-router

## Route LLM traffic to multiple providers through one OpenAI-compatible gateway.

The Chalk AI Router is a managed, OpenAI-compatible LLM gateway hosted alongside your Chalk
deployment. Point any OpenAI-compatible client at the router and it forwards your request to the
right provider — OpenAI, Anthropic, or Google Gemini — behind a single API. Because the router
sits in front of every provider, it is also where you centralize the things you do not want
scattered across application code: API keys and budgets, rate limits, automatic fallback between
models, and provider credentials.

The router is hosted with the Chalk API and requires no installation or self-hosting.

### Endpoint

The router exposes the standard OpenAI-compatible paths under this base URL:

| Path                  | Purpose                                            |
| --------------------- | -------------------------------------------------- |
| `/chat/completions`   | Chat completions (streams via Server-Sent Events). |
| `/embeddings`         | Text embeddings.                                   |
| `/images/generations` | Image generation.                                  |
| `/models`             | List the models available to your key.             |

If you are on a dedicated or self-hosted Chalk deployment, replace api.chalk.ai with your own
API server host. You can find it in the apiServer field of chalk config --format json.
Requests are routed to the calling environment's router, selected by the X-Chalk-Env-Id header
described below.

### Authentication

Every request carries an issued router API key as a bearer token, plus the environment to route
to:

| Header           | Value                                   |
| ---------------- | --------------------------------------- |
| `Authorization`  | `Bearer `                               |
| `X-Chalk-Env-Id` | The environment to route the request to |

Router API keys are issued and revoked from the dashboard — see API keys below. A
key is shown only once when it is issued, so copy it immediately.

### Using the router

The router speaks the OpenAI API, so any OpenAI-compatible client works with no code changes
beyond the base URL, key, and environment header.

```
from openai import OpenAI

client = OpenAI(
    base_url="https://<your-chalk-api-host>/v1/router",
    api_key="<YOUR_ROUTER_API_KEY>",
    default_headers={"X-Chalk-Env-Id": "<env-id>"},
)

resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)
```

The same request with curl:

```
curl https://<your-chalk-api-host>/v1/router/chat/completions \
  -H "Authorization: Bearer $CHALK_ROUTER_API_KEY" \
  -H "X-Chalk-Env-Id: <env-id>" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'
```

The model you request is resolved against the providers you have configured. A key's provider
restriction, model allow-list, team, and daily token budget are all enforced on every request.

### Calling the router from Chalk

Because the router is OpenAI-compatible, you can also call it from inside feature computation with
chalk.functions. F.openai_complete issues a chat completion while a query
runs. To route it through the AI Router rather than calling OpenAI directly, set api_server to
your Chalk API host — or set the OPENAI_BASE_URL environment variable on the execution host and
omit the argument.

```
import chalk.functions as F
from chalk.features import _
from chalkdf import DataFrame

df = DataFrame({"questions": ["Recommend some movies like High and Low by Akira Kurosawa"]})
(
    df.with_columns(
        F.openai_complete(
            prompt=_.x,
            model="o4-mini",
            service_tier="flex",
            api_server="https://api.chalk.ai",  # route through the AI Router
        ).alias("result")
    )
    .run()
    .to_arrow()
)
```

F.openai_complete takes the following arguments:

| Argument       | Description                                                                                                                                                                                                                                                           |
| -------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `prompt`       | The prompt text to send to the model.                                                                                                                                                                                                                                 |
| `model`        | The model to use. Defaults to `gpt-3.5-turbo` when omitted.                                                                                                                                                                                                           |
| `api_server`   | Base URL of an OpenAI-compatible endpoint; `/chat/completions` is appended. Falls back to the `OPENAI_BASE_URL` env var, then to the default OpenAI endpoint. Point it at your Chalk API host to use the AI Router.                                                   |
| `api_key`      | API key for authentication. Falls back to the `OPENAI_API_KEY` env var, so the secret does not have to be threaded through feature data.                                                                                                                              |
| `max_tokens`   | Maximum number of tokens to generate.                                                                                                                                                                                                                                 |
| `temperature`  | Sampling temperature between 0 and 2.                                                                                                                                                                                                                                 |
| `service_tier` | Optional OpenAI service tier — `"flex"` (cheaper, higher-latency), `"priority"`, or `"auto"`. `"flex"` is only supported on reasoning models (`o3`, `o4-mini`, `gpt-5`-class) and uses a longer request timeout; passing it with an unsupported model returns `null`. |

It returns a struct with the completion text plus prompt_tokens, completion_tokens,
total_tokens, model, finish_reason, and the upstream ratelimit_remaining_tokens /
ratelimit_remaining_requests headers.

### Throttling with rate limits

LLM calls are blocking and metered, so throttle them with the policy modifiers chained onto the
expression. with_rate_limit caps how often the call may run across every expression that shares
its key:

```
import chalk.functions as F
from chalk.features import _
from chalkdf import DataFrame

df = DataFrame({"x": ["Recommend some movies like Buzzard by Joel Potrykus"]})
(
    df.with_columns(
        F.openai_complete(prompt=_.x, model="o4-mini", service_tier="flex")
        .with_rate_limit(rate=3, key="openai", per="minute")
        .alias("result")
    )
    .run()
    .to_arrow()
)
```

The same modifier works on a feature defined in a @features class — chain .with_rate_limit(...)
before selecting the .completion field.

with_rate_limit takes:

| Argument           | Description                                                                                    |
| ------------------ | ---------------------------------------------------------------------------------------------- |
| `rate`             | Number of calls allowed per window.                                                            |
| `per`              | Window length — `"second"` (default), `"minute"`, or `"hour"`.                                 |
| `key`              | Bucket name; all expressions sharing a `key` draw from the same budget.                        |
| `enforce_globally` | When `True`, enforce the limit across all workers rather than per-worker. Defaults to `False`. |

These policy modifiers compose — chain with_concurrency, with_rate_limit, and with_retry on
the same expression to bound in-flight calls, cap the call rate, and retry transient failures with
backoff:

```
(
    F.openai_complete(prompt=_.x, model="o4-mini", service_tier="flex")
    .with_concurrency(max_concurrent=4, key="my_api")
    .with_rate_limit(rate=100, key="my_api")
    .with_retry(max_retries=3, key="my_api")
)
```

Reusing one key ties the policies to the same logical resource, so every expression that calls
"my_api" shares a single budget — here, at most 4 concurrent calls and 100 calls per second
(per defaults to "second"), with each failed call retried up to 3 times.

### API keys

Issue and revoke router keys from the AI Router → API keys tab of the Chalk dashboard. Each
key can be scoped at issue time so that a leaked or over-eager client cannot do more than you
intend:

| Setting                | Effect                                                                       |
| ---------------------- | ---------------------------------------------------------------------------- |
| **Description**        | A human-readable label for the key.                                          |
| **Team**               | Associates the key with a [team](#teams), so team-scoped rate limits apply.  |
| **Provider**           | Restricts the key to a single provider (for example, `openai`).              |
| **Model allow-list**   | Restricts the key to specific models (for example, `gpt-4o`, `gpt-4o-mini`). |
| **Daily token budget** | Caps the tokens the key may consume per day.                                 |
| **Labels**             | Custom key/value metadata.                                                   |
| **Cost tags**          | Tags used to attribute spend for billing and reporting.                      |

Each key tracks its total token usage, and revoking a key takes effect immediately.

### Teams

Teams group API keys so you can manage and limit access by team rather than key by key. Create
a team, then assign keys to it at issue time. Rate limits can be scoped to a
team so the whole team shares a single budget. Deleting a team leaves its keys working but
removes their team association.

### Rate limits

Rate limit policies cap throughput and protect against runaway spend. Each policy has a limit
type, an optional target, and a scope:

- Limit type — tokens per minute, requests per minute, or concurrent requests.
- Target (optional) — narrow the policy to a specific provider or model. Leave it unset to
apply across all traffic.
- Scope — apply the limit per individual key (token) or shared across a team.

Policies can be enabled or disabled without deleting them.

### Fallback policies

A fallback policy keeps requests succeeding when a primary model is unavailable. For a primary
model you define an ordered list of fallbacks; if a request against the primary fails, the
router retries the fallbacks in order. For example:

```
gpt-4o → [gpt-4o-mini, claude-3-5-sonnet]
```

If gpt-4o is unavailable, the router tries gpt-4o-mini, then claude-3-5-sonnet, before
giving up. Edit fallback rules in the AI Router → Fallback tab.

### Providers

Configure provider credentials in the AI Router → Settings tab. The router supports:

| Provider      | ID          |
| ------------- | ----------- |
| OpenAI        | `openai`    |
| Anthropic     | `anthropic` |
| Google Gemini | `gemini`    |

For each provider you set an API key and, optionally, a base URL override to point at a custom
or OpenAI-compatible endpoint (for example, Gemini's OpenAI-compatible API). Credentials are
applied at runtime — no redeploy is required — and each provider shows a connected or
not-configured status.

### Playground

The AI Router → Playground tab is an in-dashboard tester for the router. It authenticates
with your Chalk session — no separate API key needed — and lets you exercise the router across
three tabs:

- Chat — send streaming chat completions with a configurable system prompt, temperature,
and max tokens.
- Embeddings — generate embeddings and inspect their dimensions and token counts.
- Images — generate images with configurable size, quality, and count.

The model picker is populated from the router's /models endpoint, so it reflects the providers
you have configured.

### See also

- Authentication — service credentials and RBAC for the Chalk API.
- MCP Gateway — govern the external MCP servers your agents reach.
- LLM Toolchain — building AI features and agents on Chalk.