Model Platform
Learn how to deploy and manage machine learning models in Chalk
With Chalk, you can deploy machine learning models as isolated services running in dedicated scaling groups. This approach allows your models to run with their own compute resources, auto-scaling policies, and independent lifecycle management—separate from the Chalk engine itself.
This is different from the traditional approach of including models directly in Chalk feature resolvers. Instead of embedding model inference within your feature computation, model deployments host your models as standalone services that can be called from resolvers or external applications.
Model deployments are ideal when you want to:
To deploy models to scaling groups, register them with a container image instead of local model files or Python model objects. You can either provide a chalkcompute.Image object and let Chalk build the image for you, or reference a pre-built Docker image directly.
With a chalkcompute.Image, you define your image configuration in Python and Chalk handles building and managing the container image:
from chalk.client import ChalkClient
from chalkcompute import Image
import pyarrow as pa
client = ChalkClient()
image = (
Image.debian_slim("3.11")
.pip_install(["chalk-remote-call-python", "pyarrow"])
.add_local_file("model.py", "/app/model.py", strategy="copy")
.env({"PYTHONPATH": "/app"})
.workdir("/app")
.entrypoint(
[
"chalk-remote-call",
"--handler",
"model.handler",
"--port",
"8080",
]
)
)
client.register_model_version(
name="my-model",
input_schema={"x": pa.float64()},
output_schema={"y": pa.float64()},
model_image=image,
)Alternatively, you can register a pre-built Docker image by passing a string reference:
from chalk.client import ChalkClient
import pyarrow as pa
client = ChalkClient()
client.register_model_version(
name="my-model",
input_schema={"x": pa.float64()},
output_schema={"y": pa.float64()},
model_image="my-model-image:latest",
)See the Docker Image Requirements section below for details on building compatible images.
When your model files are large (e.g. multi-gigabyte weight files), baking them into the container image is impractical—it slows down builds, increases image pull times, and wastes storage. Instead, you can upload model artifacts to a volume that gets mounted into your container at runtime.
After registering the model version, use upload_model_to_volume to upload your files. You must use chalk_handler_volume_name to format the volume name—the deploy path uses this deterministic name to find and mount the volume. In this example, the model artifacts are stored in model.json:
from chalk.client import ChalkClient
from chalk.client.model_image import chalk_handler_volume_name, upload_model_to_volume
from chalkcompute import Image
import pyarrow as pa
client = ChalkClient()
image = (
Image.debian_slim("3.11")
.pip_install(["chalk-remote-call-python", "joblib"])
.add_local_file("handler.py", "/app/handler.py", strategy="copy")
.env({"PYTHONPATH": "/app"})
.workdir("/app")
.entrypoint(
[
"chalk-remote-call",
"--handler",
"handler.handler",
"--on-startup",
"handler.on_startup",
"--port",
"8080",
]
)
)
response = client.register_model_version(
name="my-large-model",
input_schema={"x": pa.float64()},
output_schema={"y": pa.float64()},
model_image=image,
)
# Upload model files to a volume
upload_model_to_volume(
volume_name=chalk_handler_volume_name("my-large-model", response.model_version),
model_filename="model.json",
model_file_path="./model.json",
chalk_client=client,
)The uploaded artifacts are mounted at /app/artifacts/ inside the container. Use on_startup to load the model once when the container starts:
import json
import pyarrow as pa
import pyarrow.compute as pc
model = None
def on_startup():
global model
with open("/app/artifacts/model.json") as f:
model = json.load(f)
def handler(event: dict[str, pa.Array], context: dict) -> pa.Array:
factor = model["factor"]
return pc.multiply(event["x"], pa.scalar(factor, type=pa.float64()))Your handler is the function that runs inference. It receives a dictionary of PyArrow Arrays and returns a PyArrow Array.
import pyarrow as pa
import pyarrow.compute as pc
def handler(event: dict[str, pa.Array], context: dict) -> pa.Array:
return pc.multiply(event["x"], pa.scalar(2.0, type=pa.float64()))Define an on_startup function to initialize resources before serving requests, and pass it via --on-startup in your entrypoint:
model = None
def on_startup():
global model
with open("/app/artifacts/model.json") as f:
model = json.load(f)
def handler(event: dict[str, pa.Array], context: dict) -> pa.Array:
factor = model["factor"]
return pc.multiply(event["x"], pa.scalar(factor, type=pa.float64()))chalk-remote-call --handler model.handler --on-startup model.on_startup --port 8080
Once registered, deploy a model version to a scaling group with resource specifications and auto-scaling policies.
from chalk.client import ChalkClient
from chalk.scalinggroup import AutoScalingSpec, ScalingGroupResourceRequest
client = ChalkClient()
# Deploy the model version to a scaling group
client.deploy_model_version_to_scaling_group(
name="my-model-sg",
model_name="my-model",
model_version=1,
handler="model.handler",
scaling=AutoScalingSpec(
min_replicas=1,
max_replicas=2,
target_cpu_utilization_percentage=70,
),
resources=ScalingGroupResourceRequest(
cpu="2",
memory="4Gi",
),
)Control how your model deployment scales based on demand using AutoScalingSpec.
from chalk.scalinggroup import AutoScalingSpec
# Configure auto-scaling behavior
scaling = AutoScalingSpec(
min_replicas=1, # Minimum number of replicas
max_replicas=5, # Maximum number of replicas
target_cpu_utilization_percentage=70, # Target CPU utilization (optional)
)Chalk automatically scales the number of replicas based on inference request load and CPU utilization, staying within your min/max bounds. This ensures your models handle traffic spikes efficiently without wasting resources during quiet periods.
Specify CPU, memory, and GPU resources for each replica of your model using ScalingGroupResourceRequest.
from chalk.scalinggroup import ScalingGroupResourceRequest
# Request resources per replica
resources = ScalingGroupResourceRequest(
cpu="2", # CPU allocation per replica
memory="4Gi", # Memory allocation per replica
gpu="nvidia-tesla-t4:1", # Optional: GPU type and count
)Each replica gets the specified resources. When Chalk scales from 1 to 3 replicas, total resource usage is multiplied accordingly (e.g., 3 replicas × 2 CPU = 6 CPU total).
Models deployed to scaling groups can be called from Chalk feature resolvers using the catalog_call function with the scaling group name.
from chalk.features import features, _
from chalk import functions as F
@features
class MyModel:
id: int
x: float
y: float = F.catalog_call(
"model.my-model-sg",
_.x
)The catalog call format is: model.{scaling_group_name}
You can pass multiple inputs by providing them as additional arguments:
@features
class MyModel:
id: int
x_1: float
x_2: float
y: float = F.catalog_call(
"model.my-model-sg",
_.x_1,
_.x_2
)The order of arguments must match the order of fields in your model’s input_schema.
Deploy a new version of a model to an existing scaling group:
from chalkcompute import Image
# Register a new model version with an updated Chalk image
new_version = client.register_model_version(
name="my-model",
input_schema={"x": pa.float64()},
output_schema={"y": pa.float64()},
model_image=(
Image.debian_slim("3.11")
.pip_install(["chalk-remote-call-python", "pyarrow"])
.add_local_file("model_v2.py", "/app/model.py", strategy="copy")
.env({"PYTHONPATH": "/app"})
.workdir("/app")
.entrypoint(
[
"chalk-remote-call",
"--handler",
"model.handler",
"--port",
"8080",
]
)
),
)
# Update the scaling group with the new version
client.deploy_model_version_to_scaling_group(
name="my-model-sg",
model_name="my-model",
model_version=new_version.model_version,
handler="model.handler",
)For more information on listing, inspecting, and deleting scaling groups, see the Scaling Groups page.
Model registration and deployment should be controlled manually and separately from your feature definitions. Either:
.chalkignore to prevent them from running during chalk apply.Your chalk apply will fail if it tries to run model registration and deployment code.
Organize your project to keep model management separate from feature definitions:
my-chalk-project/
|- models/ # Model deployment code (add to .chalkignore)
| |- model.py
| `- deploy_model.py # Registration + deployment script
|
|- features/ # Feature definitions (synced with chalk apply)
| |- __init__.py
| `- user_features.py
|
|- .chalkignore
`- chalk.yamlPut the following line in your .chalkignore so chalk apply skips everything under models/.
models/When using a chalkcompute.Image, Chalk builds and manages the container for you. Your image definition should:
Here’s a complete example using spaCy for named entity recognition:
Image definition:
from chalkcompute import Image
image = (
Image.debian_slim("3.11")
.pip_install(["chalk-remote-call-python", "spacy"])
.run_commands(["python -m spacy download en_core_web_sm"])
.add_local_file("model.py", "/app/model.py", strategy="copy")
.env({"PYTHONPATH": "/app"})
.workdir("/app")
.entrypoint(
[
"chalk-remote-call",
"--handler",
"model.handler",
"--port",
"8080",
]
)
)model.py:
import json
import pyarrow as pa
import spacy
nlp = None
def on_startup():
global nlp
nlp = spacy.load("en_core_web_sm")
def handler(event: dict[str, pa.Array], context: dict) -> pa.Array:
texts = event["text"].to_pylist()
results = []
for text, doc in zip(texts, nlp.pipe(texts, batch_size=32)):
if text is None:
results.append(None)
continue
entities = [
{
"text": ent.text,
"label": ent.label_,
"start": ent.start_char,
"end": ent.end_char,
}
for ent in doc.ents
]
results.append(json.dumps({"text": text, "entities": entities}))
return pa.array(results, type=pa.utf8()).base(image): Use a custom base Docker image.debian_slim(python_version): Base image with a slim Debian OS and the specified Python version.pip_install(packages): Install Python packages.run_commands(commands): Run arbitrary shell commands during the build.add_local_file(src, dest, strategy): Copy a local file into the image.add_local_dir(src, dest, strategy): Copy a local directory into the image.env(vars): Set environment variables.workdir(path): Set the working directory.entrypoint(command): Set the container entrypointModel deployments use the chalk-remote-call-python shim to handle request routing and PyArrow serialization. Your Docker image should:
FROM python:3.11-slim
WORKDIR /app
RUN pip install --no-cache-dir chalk-remote-call-python spacy
RUN python -m spacy download en_core_web_sm
COPY model.py /app/model.py
ENV PYTHONPATH=/app
EXPOSE 8080
ENTRYPOINT ["chalk-remote-call", "--handler", "model.handler", "--port", "8080"]Build and push to a registry:
docker build --platform linux/amd64 -t my-model:latest .
docker push my-model:latest