# Local SSDs for spilling and scan caching (AWS)
source: https://docs.chalk.ai/docs/local-ssd-storage

## AWS EKS guide for using local NVMe storage to speed up offline-query spilling and Iceberg scan caching

### Overview

Two Velox features benefit from fast local NVMe SSDs (LSSDs) attached to your
async offline-query workers:

- Spilling writes per-query intermediate state to disk when a query exceeds
its memory limit, letting large offline queries complete instead of
out-of-memory crashing.
- Table-scan SSD cache keeps a process-wide on-disk cache of reusable scan
ranges from external table sources. Survives query completion and engine
restarts, so repeated reads of the same partitions skip the round trip to
object storage.

Both features can share a single LSSD-backed mount on the node. This page walks
through the end-to-end setup.

Scope: this guide covers AWS EKS clusters using Karpenter. The
infrastructure steps (EC2NodeClass, NodePool, instance families) are
AWS-specific. The Chalk-side configuration (resource group, Job Queue Consumer,
environment variables, client routing) applies regardless of cloud, but the
Karpenter-specific UI fields and shell commands on this page will not apply
verbatim to GCP GKE or Azure AKS deployments. For GKE local-SSD guidance, see
the short note in Kubernetes Resources Overview
or contact Chalk support.

This setup applies to async offline queries (run_asynchronously=True),
which run on the job queue. Synchronous offline queries and
online queries don't go through the job queue and aren't affected by the
configuration on this page.

### When this is useful

You'll see the biggest impact from LSSD-backed workers when:

- Async offline queries OOM or take many minutes longer than expected because
they're spilling to slow remote EBS.
- Your offline store is backed by Iceberg and queries repeatedly read the
same partitions or backfill date ranges — the scan cache turns repeat reads
into local disk hits.
- You want spill-heavy and cache-heavy workloads isolated from latency-sensitive
online queries on a dedicated nodepool.

If your offline store is BigQuery, Snowflake, Redshift, or Databricks, the
scan cache won't help — those backends execute SQL on the warehouse and
results come back through warehouse drivers, not through Velox's scan path.
Spilling still helps if those queries spill in memory, but the scan-cache
section below applies only to the Iceberg path.

### Setup overview

- Verify the LSSD EC2NodeClass exists in your cluster.
- Create a dedicated NodePool from the Chalk dashboard.
- Add a resource group with a Job Queue Consumer that targets the new NodePool.
- Configure resource requests and environment variables.
- Route async offline queries to the new resource group from your client code.
- Verify the setup after the first job.

### Step 1: Verify the LSSD EC2NodeClass exists

Karpenter's EC2NodeClass is an AWS-only resource — these steps don't apply to
GKE or AKS clusters. Chalk's standard AWS Terraform provisions an EC2NodeClass
named al2023-offline-lssd with spec.instanceStorePolicy: RAID0
automatically. Check whether yours is present:

```
kubectl get ec2nodeclass al2023-offline-lssd
```

- If the resource exists, continue to Step 2.
- If it returns NotFound, your cluster is either on an older infrastructure
setup or you're managing the EKS cluster yourself outside Chalk's Terraform
module. Contact Chalk support to have the EC2NodeClass provisioned — it
requires cluster-specific IAM and networking values that vary across
deployments, so it's not safe to apply a generic manifest. Once support
confirms it's been created, re-run the kubectl get above and continue to
Step 2.

instanceStorePolicy: RAID0 is the critical field on the EC2NodeClass — it
makes Karpenter mount the instance's local NVMe array as the node's ephemeral
storage, so the container overlay and any writes to non-volume paths land on
local SSD.

### Step 2: Create the NodePool

In the Chalk dashboard, go to Infrastructure → Nodepools and click
+ Add New Nodepool. Use these settings:

| Field                                | Value                                        |
| ------------------------------------ | -------------------------------------------- |
| **Nodepool Name**                    | `offline-lssd` (or similar)                  |
| **EC2NodeClass**                     | `al2023-offline-lssd`                        |
| **Kubernetes Cluster**               | your cluster                                 |
| **CPU Limit**                        | `512` (cap total CPU the pool can provision) |
| **Capacity type**                    | `on-demand`                                  |
| **Instance categories**              | `m`, `c`, `r`                                |
| **Instance generations**             | `> 5`                                        |
| **Instance sizes**                   | not in `[nano, micro, small, medium, large]` |
| **Architecture**                     | `amd64`                                      |
| **Zones**                            | your cluster's availability zones            |
| **Isolate this nodepool**            | ✓ checked                                    |
| **Restrict to Chalk workloads only** | ✓ checked                                    |
| **Nodepool Workload Type**           | `Default` (leave alone)                      |

Because the al2023-offline-lssd EC2NodeClass sets instanceStorePolicy: RAID0,
Karpenter will only provision instance types that have local NVMe storage —
no extra constraint is required to filter out non-LSSD families. If no LSSD
instance is available in the requested categories or zones, pods will stay
Pending rather than fall back to EBS.

Do not set Nodepool Workload Type to Offline. The dropdown option adds a
chalk.ai/workload-type=offline:NoSchedule taint that no Chalk pod currently
tolerates, which would make the pool repel every workload. Leave it as
Default.

The two isolation checkboxes generate the taints that exclude unrelated
workloads:

- chalk.ai/nodepool=offline-lssd:NoSchedule (from "Isolate this nodepool")
- chalk.ai/managed-by=chalk:NoSchedule (from "Restrict to Chalk workloads only")

Chalk auto-adds matching tolerations to pods that target this pool via the
Resource Configuration form in Step 3.

### Step 3: Add a resource group with a Job Queue Consumer

Go to Infrastructure → Resource Configuration. At the bottom of the resource
groups tree, click + Add Resource Group. Give it a name like
offline-lssd.

Under the new resource group, add a Job Queue Consumer service. You do
not need to add a separate Job Queue Manager — there is one environment-wide
Manager that polls jobs across all resource groups and spawns the per-group
Consumer Deployments on demand.

On the Job Queue Consumer page:

- Nodepool: select offline-lssd.
- Instance Type: leave as None so Karpenter picks from the pool's
allowed instance types.

### Step 4: Configure resources and environment variables

### Resource requests

Set requests on the Requests panel. Two starting profiles, pick based on the
size of your typical async offline query:

### Standard (lands on a 2xlarge LSSD instance)

| Setting           | Value   |
| ----------------- | ------- |
| CPU               | `7`     |
| Memory            | `50Gi`  |
| Ephemeral Storage | `350Gi` |

Forces Karpenter to pick a 2xlarge LSSD instance (e.g. r6id.2xlarge —
8 vCPU, 64 GiB RAM, ~474 GB NVMe).

### Heavier (for very large async offline queries)

| Setting           | Value   |
| ----------------- | ------- |
| CPU               | `15`    |
| Memory            | `100Gi` |
| Ephemeral Storage | `600Gi` |

Forces a 4xlarge LSSD instance (e.g. r6id.4xlarge — 16 vCPU, 128 GiB RAM,
~950 GB NVMe).

Leave the Limits panel blank so spill writes can use whatever the LSSD
provides without an artificial cap. Set Min Instances to 0 to scale
to zero when idle, and Max Instances to 2 or 3 to cap concurrent
LSSD nodes.

The scan cache is per-pod — each Consumer pod has its own cache on its own
node's LSSD, and libchalk uses an exclusive lock so caches are never shared
across pods. Setting Min Instances to 1 keeps one warm cache alive,
not a pool-wide warm cache. If the workload bursts above one pod, the
additional pods start cold and warm their own caches independently. Only
raise Min Instances above 0 if the workload re-reads the same data
consistently enough that paying for one always-on LSSD instance (~$15/day
for an r6id.2xlarge) is worth it.

### Environment variables — required for all backends

Add these under Environment Variable Overrides:

| Variable                                         | Value               | Purpose                                                    |
| ------------------------------------------------ | ------------------- | ---------------------------------------------------------- |
| `CHALK_VELOX_SPILL_DIRECTORY`                    | `/chalk-lssd-spill` | Per-query spill scratch space on local NVMe                |
| `CHALK_VELOX_QUERY_DEFAULT_MEMORY_LIMIT_PERCENT` | `75`                | Raise spill threshold — LSSD-dedicated nodes have headroom |

/chalk-lssd-spill does not need to be mounted explicitly. With
instanceStorePolicy: RAID0 in effect, the container's writable overlay sits
on the local NVMe array, so the engine creates the directory at this path and
all writes go to LSSD automatically.

CHALK_VELOX_QUERY_DEFAULT_MEMORY_LIMIT_PERCENT=75 sets the in-memory working
set Velox keeps before spilling to 75% of the container's cgroup memory limit.
With Memory=50Gi, that's ~37 GiB of in-memory work before spill kicks in;
with 100Gi, it's ~75 GiB.

### Environment variables — Iceberg only (optional)

If your offline store is backed by Iceberg (or you read Parquet/Delta tables
directly through Velox via static resolvers), also add:

| Variable                                    | Value          | Purpose                                                       |
| ------------------------------------------- | -------------- | ------------------------------------------------------------- |
| `LIBCHALK_VELOX_TABLE_SCAN_SSD_CACHE_BYTES` | `214748364800` | 200 GiB persistent on-disk scan cache, shares the spill mount |

The cache directory defaults to CHALK_VELOX_SPILL_DIRECTORY/table_scan_cache
when not otherwise configured, so no extra path setup is needed.

Skip this variable if your offline store is BigQuery, Snowflake, Redshift, or
Databricks. Those backends execute SQL on the warehouse and never go through
Velox's table-scan operators, so the cache would be initialized but never see
any reads — wasting LSSD capacity that could otherwise hold spill files.

### Sizing the scan cache

214748364800 (200 GiB) is a stock starting value, not a universal default.
The right size is roughly the working set of distinct external-table partitions
your async offline queries repeatedly touch:

- Small / focused workloads (e.g. daily backfills over the same date range):
10-50 GiB is usually enough.
- Broad ad-hoc analytics that read many partitions: 100-300 GiB or more.

The cache size also has to fit on the LSSD alongside spill scratch. On a
2xlarge LSSD instance (~474 GB usable), 200 GiB for the cache leaves
~270 GB for spill files and the container overlay — comfortable. On smaller
shapes, scale down. If startup logs warn that the cache directory's available
space is below the configured cache size, either shrink this value or
increase the ephemeral-storage request so Karpenter picks a larger LSSD
instance.

### Step 5: Route queries from your client

The default resource group for offline queries is "default". To send a
specific async offline query to the new LSSD-backed resource group, pass
ResourceRequests(resource_group=...):

```
from chalk.client import ChalkClient, ResourceRequests

client = ChalkClient()
client.offline_query(
    input={'user.id': range(1_000_000)},
    output=['user.name'],
    run_asynchronously=True,
    resources=ResourceRequests(
        resource_group="offline-lssd",
    ),
)
```

Only queries that explicitly opt in via resource_group= will land on the new
pool. Existing queries continue to use the default resource group and its
existing nodepool, so you can roll out LSSD gradually for the queries that
benefit most.

For scheduled queries, the equivalent kwarg lives
directly on ScheduledQuery:

```
from chalk import ScheduledQuery

ScheduledQuery(
    name="weekly-aggregations",
    schedule="0 0 * * 0",
    output=[User.historical_aggregates],
    resource_group="offline-lssd",
)
```

### Step 6: Verify the setup

After running an async offline query against the new resource group, confirm
spilling actually happened by checking the query's performance summary in the
Chalk dashboard for spill_enabled=true and a nonzero spilled_bytes value.
If neither field appears, the query didn't exceed its memory limit and didn't
need to spill — small queries that fit in memory won't trigger it.

### See also

- Offline queries — disk spilling and planner options
for offline queries.
- Job queue — how async offline queries flow through the
job queue and Resource Groups.
- Kubernetes resources — overview of
Karpenter NodePools and EC2NodeClass.




