Chalk’s service architecture is divided into two planes: the Metadata Plane and the Data Plane. Understanding what flows between them is critical for security-conscious deployments, especially when configuring data residency controls or auditing what can transit outside your cloud boundary.

Key principle: The Metadata Plane orchestrates the Data Plane but never stores customer feature values. All production query traffic flows directly from your API clients to the Data Plane.


Summary of Data Flows

FlowDirectionContains Customer Data?Required?Can Be Disabled?
LogsData → MetadataNo (metadata only)NoYes
MetricsData → MetadataNoNoYes
Query ExecutionMetadata → DataYes (feature values)NoYes
EKS API AccessMetadata → DataNoYesNo
Container Images (ECR/Artifact Registry)Metadata → DataNoYesNo

Data Flows In Detail

Logs

Direction: Data Plane → Metadata Plane

Logs are collected by an OpenTelemetry Collector running in the Data Plane EKS/GKE cluster and optionally forwarded to the Metadata Plane for centralized dashboarding.

What’s included:

  • Application logs from query servers, stream workers, and batch workers
  • Resolver execution logs (which features were computed and when)
  • Error logs and exception traces
  • Query audit logs

What’s NOT included: Actual feature values or customer PII. Logs contain metadata about computations (e.g. resolver name, latency, error type), not the data itself.

Disabling: Logs can be kept exclusively within the Data Plane by configuring the OpenTelemetry Collector to export to your own observability tooling (e.g. Dynatrace, Datadog) instead of forwarding to the Metadata Plane.


Metrics

Direction: Data Plane → Metadata Plane

Performance and operational metrics are emitted by the Data Plane and optionally forwarded for centralized monitoring.

Metrics collected:

  • Query latency (P50, P95, P99)
  • Query success and error rates
  • Ingestion delay (time from event to feature availability in the online store)
  • Kafka consumer lag
  • Cache hit rates
  • Resource utilization (CPU, memory, disk)

Metrics do not contain customer data—only aggregated operational statistics.

Disabling: Like logs, metrics can be routed exclusively to your own monitoring systems via OpenTelemetry exporter configuration.


Query Execution

Direction: Metadata Plane → Data Plane

This flow is what enables the Chalk web UI to execute queries interactively against your live data plane: the UI sends a request to the Metadata Plane, which forwards it as an API client to your Data Plane, and returns the results.

What’s included: Query inputs, feature outputs, and execution plan metadata. This flow can transmit customer feature values (including PII).

Disabling: This flow requires a VPC Endpoint (VPCE / PrivateLink) between the Metadata Plane and Data Plane. Removing that connection disables it entirely, ensuring customer data never transits to the Metadata Plane.

What you lose by disabling this flow

Disabling query execution connectivity is not a simple on/off toggle—it removes a significant portion of Chalk’s product capabilities:

  • Planning and backfill engines — aggregate backfill planning, historical feature computation scheduling, and incremental update strategies require the Metadata Plane to be able to query the Data Plane
  • Web UI query testing — interactive query debugging and feature value inspection
  • Real-time monitoring dashboards — feature freshness views and query plan visualization
  • Data quality tooling — automated health checks, feature assertions, and resolver performance analysis

What is never affected

Regardless of how you configure Metadata-to-Data-Plane connectivity, production online query traffic is not affected. When using Named Queries, your API clients talk directly to the Data Plane and never route through the Metadata Plane. OAuth token exchange still occurs via the Metadata Plane, but no feature values transit it.


EKS API Access

Direction: Metadata Plane → Data Plane

The Metadata Plane needs access to your Data Plane’s Kubernetes API server to manage the lifecycle of your Chalk deployment.

What it’s used for:

  • Deploying updated container images (built by Argo Image Builder)
  • Scaling Kubernetes workloads (query servers, stream workers, batch workers)
  • Executing rolling updates with zero downtime
  • Monitoring pod health and readiness states

What’s NOT included: No customer feature data is exposed via the Kubernetes API—only infrastructure metadata (pod status, deployment state, etc.).

Disabling: This flow is required. Without it, Chalk cannot deploy code changes, scale resources, or perform health monitoring.

EKS API access patterns

There are two main options for how the Metadata Plane connects to the Data Plane Kubernetes API:

Option A: Public EKS API with IP whitelisting (recommended)

The EKS API server endpoint is publicly accessible, but access is restricted to the Chalk Metadata Plane’s IP ranges via whitelist. All traffic is encrypted in transit (TLS), and AWS IAM authentication is required for all API calls.

Benefits: simpler setup, zero-downtime deployments guaranteed, no dependency on VPC Endpoint infrastructure.

Option B: Fully private EKS API

The EKS API server is only accessible from within the VPC. If the Metadata Plane and Data Plane are in separate VPCs, this requires VPC peering or Transit Gateway.

Drawbacks: operationally complex, and zero-downtime deployments cannot be guaranteed because EKS API endpoints don’t have stable IP addresses—a change in endpoint IP can break connectivity until network rules are manually updated (estimated recovery: 15+ minutes).

This option is typically only warranted when regulatory requirements mandate a fully private control plane.


Container Images

Direction: Metadata Plane → Data Plane

When you deploy a new version of your Chalk project, the Metadata Plane’s Argo Image Builder builds Docker container images and pushes them to your ECR (AWS) or Artifact Registry (GCP) repository. The Data Plane then pulls these images when deploying.

What’s included: Docker container images for Chalk services. No customer feature data.

Disabling: This flow is required for deployments.


Configuration Options

Establish a VPC Endpoint between the Metadata Plane and Data Plane, and rely on Chalk’s RBAC system for access control. This gives you full product functionality: planning engines, web UI query testing, real-time dashboards, and data quality tooling.

In a Customer Cloud deployment or Air-Gapped deployment where both planes run within your cloud boundary, this is the recommended configuration. Your data never leaves your infrastructure, and Chalk RBAC provides granular user-level access control over what can be queried via the UI.

Restricted connectivity (maximum data isolation)

Do not establish a VPCE connection from the Metadata Plane to the Data Plane. This ensures customer data can never transit the Metadata Plane under any circumstances.

Trade-offs:

  • Loss of planning engines for backfills and historical computation
  • No web UI query testing or feature value inspection
  • Limited operational dashboards
  • Teams need alternative tooling for feature validation and debugging

This configuration is appropriate for organizations with strict data residency requirements where no customer data—even via authorized queries—can touch infrastructure outside a defined boundary.