Customer Cloud installation with Helm
Chalk offers a hosted model (“Chalk Cloud”) and a customer-hosted model (“Customer Cloud”). Most companies choose to run Chalk in their own cloud using the Customer Cloud model. This page discusses the self-managed Customer Cloud Deployment (sometimes operated in “Air Gapped” style).
A Chalk deployment consists of a Metadata Plane and one or more Data Planes:
The Metadata Plane is a single control-plane installation that manages deployments, authentication, billing, and orchestration. Each Data Plane is an EKS cluster, controlled by the Metadata Plane, that runs the feature engineering workloads for one or more Chalk environments. The Metadata Plane and a Data Plane may share a cluster, or each Data Plane may live in its own cluster.
This guide will walk through deploying a Chalk instance in your cloud environment using Helm, an open-source package manager for Kubernetes.
This guide requires a few pieces of software to be installed on your machine.
helm list to verify that helm is installed.The Metadata Plane is the Chalk control plane. It manages deployments, authentication, billing, and orchestrates resources across one or more Data Planes. The instructions in this section describe an AWS EKS installation of the Metadata Plane.
Before you can deploy the Metadata Plane, you will need to provision the underlying AWS resources. Work with your Chalk support team to create the following components.
The Metadata Plane stores its configuration, deployment history, and team/project/environment metadata in PostgreSQL.
auto_explainpg_stat_statementspg_cronchalk)chalk)The Metadata Plane uses SQS to drive asynchronous workflows. Provision the following queues:
| Queue | Purpose |
|---|---|
metric-check-trigger | Metric validation workflows |
scheduled-resolver-trigger | Scheduled resolver execution |
scheduled-query-trigger | Scheduled query execution |
batch-status | Batch processing status updates |
argo-builds | Workflow build notifications |
batch-report | Batch processing reports |
heartbeat | Service health monitoring |
The Metadata Plane API server requires an IAM role bound to its Kubernetes service account via IRSA (IAM Roles for Service Accounts).
Trust policy — bind the role to the EKS OIDC provider with the following service account:
chalk-metadatachalk-metadata-planests.amazonaws.comPermissions — the role requires the following actions:
s3:* — access source code and datasetsecr:* — view and pull container imagessqs:* — interact with the deployment queues listed abovests:AssumeRole — assume customer roles for accessing Data Plane resourcesBefore installing the Helm chart, create the namespace and the secret that holds the database connection information.
Create a namespace for the Metadata Plane:
kubectl create namespace chalk-metadataThe Metadata Plane reads its database connection information from a
Kubernetes secret named metadata-plane-secrets in the chalk-metadata
namespace. Create a file (do not check it in) named
metadata-plane-secrets.env with the following keys:
POSTGRES_USER=chalk
POSTGRES_PASSWORD=<your-rds-password>
POSTGRES_HOST=<your-rds-endpoint>
POSTGRES_DB=chalkThen create the secret from the file:
kubectl create secret generic metadata-plane-secrets \
--namespace chalk-metadata \
--from-env-file metadata-plane-secrets.envNext, authenticate to the Chalk Private Helm Registry so that you can access Chalk Helm charts.
To authenticate in AWS using an IAM role, run the following command:
aws ecr get-login-password --region us-east-1 \
| helm registry login --username AWS --password-stdin 754784422779.dkr.ecr.us-east-1.amazonaws.comFor GCP, please configure your gcloud cli by following the Google Documentation
with the following location:
us-docker.pkg.devTo verify that you are properly authenticated, you can perform a dry run of templating the Chalk Metadata Plane Helm chart. This command will print an error, because we have not configured important values for this chart, but this failure indicates that you are properly fetching the chart and attempting to render it.
To check on AWS, run:
helm template chalk-metadata-plane oci://754784422779.dkr.ecr.us-east-1.amazonaws.com/charts/chalk-metadata-planeon GCP, run:
helm template chalk-metadata-plane oci://us-docker.pkg.dev/chalk-prod/charts/chalk-metadata-planeThese commands will fail with a message like this:
Pulled: 754784422779.dkr.ecr.us-east-1.amazonaws.com/charts/chalk-metadata-plane:0.1.2
Digest: sha256:15774ef462c772af0496e1af768529e13503c5b7a5513b5a4d2f75359bddc7ea
Error: execution error at (chalk-metadata-plane/templates/frontend/deployment.yaml:2:4): Value chalk.metadata.frontend.image is requiredThis error is expected, as we have not yet configured the chart. If you see this error, you are ready to proceed.
Next, we will configure the values file for the Chalk Metadata Plane. This file will contain all the necessary configuration for your Chalk deployment.
values.yaml and copy the following contents into it:chalk:
metadata:
# Your API host. Chalk's default host is api.chalk.ai,
# but you will need to configure one for your instance.
api_host: <YOUR API HOST, e.g. https://api.chalk.ai>
# Your frontend host. Chalk's default host is chalk.ai,
# but you will need to configure one for your instance.
frontend_host: <YOUR FRONTEND HOST, e.g. https://chalk.ai>
frontend:
image: <YOUR FRONTEND IMAGE>Note: <YOUR FRONTEND IMAGE> will be provided by Chalk.
Next, we will configure the database seeding for the Chalk Metadata Plane. This file contains the team, project, and environment configuration, and initial users for your Chalk deployment.
Note: this is just a skeleton for the initial bootstrap of the system - please use the Chalk Terraform Provider to define environments, projects, and other Chalk-to-cloud infrastructure bindings for your data planes.
seed.yaml and copy the following contents into it:chalk:
metadata:
seed:
teams:
# lowercase, less than 10 characters, no spaces or special characters.
- id: teamshortid
name: Your Company Name
projects:
# lowercase, less than 10 characters, no spaces or special characters.
- id: projectshortid
name: Your Project Name
team_id: teamshortid
environments:
# lowercase, less than 10 characters, no spaces or special characters.
- id: envshortid
name: Development
project_id: projectshortid
team_id: teamshortid
team_invites:
- id: "seed_invite_1"
team: teamshortid
email: "your@email.com"
role: ownerChalk supports various forms of SSO — OIDC, SAML, and others. For this guide, we will configure Google OIDC.
Create a file named oidc.env, and add the following contents. Do not check this file in:
GOOGLE_CLIENT_ID=YOUR_GOOGLE_CLIENT_ID
GOOGLE_CLIENT_SECRET=YOUR_GOOGLE_CLIENT_SECREETThen, create a Kubernetes secret with this file:
kubectl create secret generic chalk-frontend-secrets \
--namespace chalk-metadata \
--from-env-file frontend-secrets.envNow that you have configured your values file and your database seeding, you can deploy the Chalk Metadata Plane.
helm install chalk-metadata-plane oci://317932201237.dkr.ecr.us-east-1.amazonaws.com/charts/chalk-metadata-plane \
--namespace chalk-metadata \
--values values.yaml \
--values seed.yamlTo verify that the installation was successful, you can run the following command:
kubectl get pods -n chalk-metadataYou should see pods starting up in your namespace. If you see any errors, you can run kubectl describe pod <podname>
to get more information.
Once your pods are started, visit the frontend_host you configured in your values.yaml file to see the Chalk
frontend. You should be able to log in.
A Data Plane is an EKS cluster that runs Chalk feature engineering workloads (resolver execution, query serving, model inference). Each Chalk environment is largely tenant in a single Data Plane cluster, but a single account may have any mapping of Data Plane clusters to Chalk environments.
A Data Plane cluster may be the same EKS cluster that hosts the Metadata Plane, or it may be a separate cluster. The Data Plane cluster is managed by the Metadata Plane: once the cluster and supporting AWS resources exist and the Metadata Plane is granted appropriate access, the Metadata Plane will provision per-environment IAM roles, IRSA bindings, and other resources automatically. There is no environment-level IaC to maintain.
Configure an AWS role for Chalk according to the AWS Cloud Deployment guide. This role must have a cluster admin EKS access entry for the Data Plane cluster, so that the Metadata Plane can manage Kubernetes resources within the cluster.
Provision five S3 buckets for the Data Plane:
data — feature datasource — deployed source codedataset — materialized datasetsmodel — model artifactsstages — query plan stagesConfigure CORS on all five buckets to allow GET requests from
api.chalk.ai and chalk.ai (or the equivalent hosts you configured for
your Metadata Plane).
t3.medium instances to run background OSS controllersChalk uses standard EKS with Karpenter for scheduling, on AL2023 nodes. EKS Autopilot is supported but has been buggy in practice; because Autopilot is a fork of upstream EKS, it is markedly harder to troubleshoot, so standard EKS is recommended.
Provision a Route 53 hosted zone per cluster. Each cluster needs its own
zone so that it can manage cluster-level DNS records. Chalk uses
external-dns and cert-manager to automate DNS and certificate
management, and routes traffic via Envoy Gateways with Let’s Encrypt
signed certs.
Chalk uses MSK for background message processing. An MSK cluster may be shared across multiple Chalk Data Plane clusters; each cluster will use its own set of topics, and must be able to route to the MSK cluster.
The Data Plane cluster relies on a number of open-source Helm charts. Install the following charts:
| Chart | Version | Purpose |
|---|---|---|
| ArgoWorkflows | 0.45.27 | In-cluster workflows |
| KEDA | 2.11.1 | Event-driven autoscaling |
| Metrics Server | 3.12.2 | Resource metrics |
| S3 CSI Driver | 2.0.0 | S3 volume mounting |
| Envoy Gateway | 1.6.0 | API gateway |
| Cert Manager | latest | TLS certificates |
| External DNS | 1.17.0 | DNS automation |
| CloudNativePG | 0.26.0 | PostgreSQL operator |
| Karpenter | 1.0.0+ | Node autoscaling |
| EBS CSI Driver | latest | EBS volumes |
| AWS Load Balancer Controller | latest | Network load balancing |
A few notes on configuration:
cert-manager and external-dns must be configured to support the
Gateway API and to watch XRoute resources. They also need Route 53
permissions to manage DNS records for the Chalk Data Plane gateway.In each Data Plane cluster, configure:
ClusterIssuer using a DNS-01 challenge via Route 53EC2NodeClass resources tied to the appropriate VPCsThe Background Persistence component runs in the Data Plane cluster and writes query results to online and offline storage. See the Background Persistence Installation guide for the full configuration walkthrough. At minimum, provision:
background-persistence)jsonencode({
Statement = [
{
Action = [
"s3:*", // pull parquet files from S3
"dynamodb:*", // used if the customer has a DynamoDB online store
"secretsmanager:*", // load secrets from AWS Secrets Manager
"ecr:BatchGetImage", // download persistence base images from the Chalk registry
"ecr:GetAuthorizationToken", // download persistence base images from the Chalk registry
"ecr:GetDownloadUrlForLayer", // download persistence base images from the Chalk registry
"kms:GenerateDataKey",
"glue:*" // Iceberg offline store
]
Effect = "Allow"
Resource = "*"
},
]
})Each Chalk environment is provisioned and managed by the Metadata Plane, so per-environment infrastructure does not require IaC. When an environment is created from the Chalk UI, the Metadata Plane will automatically provision the IAM role and IRSA binding required for the environment to function within the Data Plane cluster.
For private deployments, the Metadata Plane ingress must be configured to allow access from the Metadata Plane to the Data Plane clusters. This involves creating a PrivateLink gateway pointed at the Envoy Gateway service in the Data Plane cluster. Because the Metadata Plane bootstraps the Envoy Gateway via the Kubernetes API, this step is performed after the Data Plane cluster has been initially provisioned and the Metadata Plane has reconciled it.
Now that you have deployed the Chalk Metadata Plane and configured a Data Plane cluster, you can configure your local environment to interact with your Chalk instance.