Metaplanning is an environment-level feature that automatically parallelizes scheduled offline queries by splitting them into multiple shards that run concurrently.

Contact our support team to enable metaplanning in your environments.


How It Works

When metaplanning is enabled for your environment, all scheduled offline queries go through a metaplanning workflow:

Scheduled Query Triggered
Step 1: Metaplan
Has inputs?
Use inputs
OR
No inputs?
Compute shard keys
Step 2: Autoshard
Input Data
100,000 rows
Count & Divide
by 10,000
Job 1
Job 2
...
Job 10
Step 3: Sharded Query
Shard 1
10k rows
Shard 2
10k rows
...
Shard N
10k rows
All shards run simultaneously

Example: A query with 100,000 rows and the default target of 10,000 rows per shard creates 10 parallel shard jobs.

Configuration

Metaplanning is configured at the environment level. Once enabled, all scheduled offline queries in that environment use metaplanning automatically.

The shard size can be controlled via the AUTOSHARDER_TARGET_ROWS_PER_SHARD environment variable (default: 10,000 rows per shard).

Example

ScheduledQuery(
    name="daily_user_scores",
    outputs=[User.id, User.score],
    schedule="0 0 * * *",
)

With metaplanning enabled, this query will:

  1. Compute shard keys to find all User IDs (since no input is specified)
  2. Autoshard the IDs based on the configured target
  3. Execute all shards in parallel

See Also