Metadata Plane & Data Plane Communication

Chalk’s service architecture is divided into two planes: the Metadata Plane and the Data Plane. Understanding what flows between them is critical for security-conscious deployments, especially when configuring data residency controls or auditing what can transit outside your cloud boundary.

Key principle: The Metadata Plane orchestrates the Data Plane but never stores customer feature values. All production query traffic flows directly from your API clients to the Data Plane.

Summary of Data Flows

Flow	Direction	Contains Customer Data?	Required?	Can Be Disabled?
Logs	Data → Metadata	No (metadata only)	No	Yes
Metrics	Data → Metadata	No	No	Yes
Query Execution	Metadata → Data	Yes (feature values)	No	Yes
EKS API Access	Metadata → Data	No	Yes	No
Container Images (ECR/Artifact Registry)	Metadata → Data	No	Yes	No

Data Flows In Detail

Logs

Direction: Data Plane → Metadata Plane

Logs are collected by an OpenTelemetry Collector running in the Data Plane EKS/GKE cluster and optionally forwarded to the Metadata Plane for centralized dashboarding.

What’s included:

Application logs from query servers, stream workers, and batch workers
Resolver execution logs (which features were computed and when)
Error logs and exception traces
Query audit logs

What’s NOT included: Actual feature values or customer PII. Logs contain metadata about computations (e.g. resolver name, latency, error type), not the data itself.

Disabling: Logs can be kept exclusively within the Data Plane by configuring the OpenTelemetry Collector to export to your own observability tooling (e.g. Dynatrace, Datadog) instead of forwarding to the Metadata Plane.

Metrics

Direction: Data Plane → Metadata Plane

Performance and operational metrics are emitted by the Data Plane and optionally forwarded for centralized monitoring.

Metrics collected:

Query latency (P50, P95, P99)
Query success and error rates
Ingestion delay (time from event to feature availability in the online store)
Kafka consumer lag
Cache hit rates
Resource utilization (CPU, memory, disk)

Metrics do not contain customer data—only aggregated operational statistics.

Disabling: Like logs, metrics can be routed exclusively to your own monitoring systems via OpenTelemetry exporter configuration.

Query Execution

Direction: Metadata Plane → Data Plane

This flow is what enables the Chalk web UI to execute queries interactively against your live data plane: the UI sends a request to the Metadata Plane, which forwards it as an API client to your Data Plane, and returns the results.

What’s included: Query inputs, feature outputs, and execution plan metadata. This flow can transmit customer feature values (including PII).

Disabling: This flow requires a VPC Endpoint (VPCE / PrivateLink) between the Metadata Plane and Data Plane. Removing that connection disables it entirely, ensuring customer data never transits to the Metadata Plane.

What you lose by disabling this flow

Disabling query execution connectivity is not a simple on/off toggle—it removes a significant portion of Chalk’s product capabilities:

Planning and backfill engines — aggregate backfill planning, historical feature computation scheduling, and incremental update strategies require the Metadata Plane to be able to query the Data Plane
Web UI query testing — interactive query debugging and feature value inspection
Real-time monitoring dashboards — feature freshness views and query plan visualization
Data quality tooling — automated health checks, feature assertions, and resolver performance analysis

What is never affected

Regardless of how you configure Metadata-to-Data-Plane connectivity, production online query traffic is not affected. When using Named Queries, your API clients talk directly to the Data Plane and never route through the Metadata Plane. OAuth token exchange still occurs via the Metadata Plane, but no feature values transit it.

EKS API Access

Direction: Metadata Plane → Data Plane

The Metadata Plane needs access to your Data Plane’s Kubernetes API server to manage the lifecycle of your Chalk deployment.

What it’s used for:

Deploying updated container images (built by Argo Image Builder)
Scaling Kubernetes workloads (query servers, stream workers, batch workers)
Executing rolling updates with zero downtime
Monitoring pod health and readiness states

What’s NOT included: No customer feature data is exposed via the Kubernetes API—only infrastructure metadata (pod status, deployment state, etc.).

Disabling: This flow is required. Without it, Chalk cannot deploy code changes, scale resources, or perform health monitoring.

EKS API access patterns

There are two main options for how the Metadata Plane connects to the Data Plane Kubernetes API:

Option A: Public EKS API with IP whitelisting (recommended)

The EKS API server endpoint is publicly accessible, but access is restricted to the Chalk Metadata Plane’s IP ranges via whitelist. All traffic is encrypted in transit (TLS), and AWS IAM authentication is required for all API calls.

Benefits: simpler setup, zero-downtime deployments guaranteed, no dependency on VPC Endpoint infrastructure.

Option B: Fully private EKS API

The EKS API server is only accessible from within the VPC. If the Metadata Plane and Data Plane are in separate VPCs, this requires VPC peering or Transit Gateway. See Private EKS API Server Connectivity for the full topology and configuration walkthrough.

Drawbacks: operationally complex, and zero-downtime deployments cannot be guaranteed because EKS API endpoints don’t have stable IP addresses—a change in endpoint IP can break connectivity until network rules are manually updated (estimated recovery: 15+ minutes).

This option is typically only warranted when regulatory requirements mandate a fully private control plane.

Container Images

Direction: Metadata Plane → Data Plane

When you deploy a new version of your Chalk project, the Metadata Plane’s Argo Image Builder builds Docker container images and pushes them to your ECR (AWS) or Artifact Registry (GCP) repository. The Data Plane then pulls these images when deploying.

What’s included: Docker container images for Chalk services. No customer feature data.

Disabling: This flow is required for deployments.

Direct Communication with Data Plane

Direction: Customer API Client → Data Plane (with auth token exchange via Metadata Plane)

Production query traffic from customer-owned API clients does not flow through the Metadata Plane. Instead, customer-owned API clients:

Exchange an auth token with the Metadata Plane. The client authenticates against the Metadata Plane (typically via OAuth client credentials) and receives an access token scoped to the target environment. Tokens are intended to be cached by the client for a long time — typically only re-fetched as they near expiry — so the Metadata Plane is in the request path only on the rare token refresh, never on the hot path of queries or ingestion.
Speak directly to the Data Plane’s load balancers. Using that token, the client issues query and ingestion requests directly to the Data Plane. The load balancer fronting the Data Plane may be either private (VPC-internal, accessed via PrivateLink, VPC peering, or on-prem connectivity) or public (internet-facing with TLS) — the choice is up to the customer based on their network and compliance posture. Public load balancers can additionally restrict access via IPv4 allowlists (e.g. AWS security groups or WAF IP set rules) to limit which client networks can reach the Data Plane.

This means feature values flow only between the customer’s API clients and their own Data Plane; the Metadata Plane sees the auth handshake but not the query payloads or results.

Online query traffic

Online queries are synchronous: the client posts inputs to the Data Plane and receives the computed feature values in the response. The auth token is exchanged once with the Metadata Plane (and cached), then reused across many queries against the Data Plane load balancer.

Online query sequence diagram

Offline query traffic

Offline (batch) queries are asynchronous. The client submits a job to the Data Plane, which enqueues it onto a job queue; an offline worker pod consumes the queue, runs the query against the offline store, and writes the result dataset. The client polls the Data Plane for status using the same cached token, and finally fetches the materialized dataset.

Offline query sequence diagram

Direct ingestion (upload_features)

Producers can push feature values into the online store without going through a resolver by calling the Data Plane’s upload_features endpoint. As with queries, the client first exchanges an auth token with the Metadata Plane (cached for a long time), then sends rows directly to the Data Plane load balancer. Background persistence asynchronously flushes the same rows to the offline store.

Direct ingestion sequence diagram

Configuration Options

Full connectivity (recommended)

Establish a VPC Endpoint between the Metadata Plane and Data Plane, and rely on Chalk’s RBAC system for access control. This gives you full product functionality: planning engines, web UI query testing, real-time dashboards, and data quality tooling.

In a Customer Cloud deployment or Air-Gapped deployment where both planes run within your cloud boundary, this is the recommended configuration. Your data never leaves your infrastructure, and Chalk RBAC provides granular user-level access control over what can be queried via the UI.

Restricted connectivity (maximum data isolation)

Do not establish a VPCE connection from the Metadata Plane to the Data Plane. This ensures customer data can never transit the Metadata Plane under any circumstances.

Trade-offs:

Loss of planning engines for backfills and historical computation
No web UI query testing or feature value inspection
Limited operational dashboards
Teams need alternative tooling for feature validation and debugging

This configuration is appropriate for organizations with strict data residency requirements where no customer data—even via authorized queries—can touch infrastructure outside a defined boundary.

​Summary of Data Flows

​Data Flows In Detail

​Logs

​Metrics

​Query Execution

​What you lose by disabling this flow

​What is never affected

​EKS API Access

​EKS API access patterns

​Container Images

​Direct Communication with Data Plane

​Online query traffic

​Offline query traffic

​Direct ingestion (upload_features)

​Configuration Options

​Full connectivity (recommended)

​Restricted connectivity (maximum data isolation)

On this page

Summary of Data Flows

Data Flows In Detail

Logs

Metrics

Query Execution

What you lose by disabling this flow

What is never affected

EKS API Access

EKS API access patterns

Container Images

Direct Communication with Data Plane

Online query traffic

Offline query traffic

Direct ingestion (upload_features)

Configuration Options

Full connectivity (recommended)

Restricted connectivity (maximum data isolation)