Chalk supports Snowflake as a SQL source.


Prerequisites

If Chalk has helped you configure a Snowflake Offline Store already, then some of these commands may overlap with work that is already done. Please review your Snowflake configuration for existing objects matching the ones below before proceeding.

For detailed information about Snowflake storage integrations and external stages, refer to:

Permissions

To use Snowflake as a Chalk datasource, the user or role specified in your integration must have the following permissions:

  • USAGE on the warehouse, database, schema, and stage
  • SELECT on all tables and views being queried
  • READ on the stage
  • WRITE on the stage
  • CREATE TEMPORARY TABLE in the schema
  • DROP on temporary tables in the schema

Chalk creates temporary tables during feature computation to push down filters to optimize your query loads. In addition, Chalk uses the stages to optimize data unloading.

RBAC Setup

Run against your Snowflake instance as a user with the ACCOUNTADMIN role (or a role with the privileges to create roles and grant the permissions below) to set up a role with the appropriate permissions:

-- Set variables (example values shown; customize for your environment)
SET ROLE_NAME='CHALK_ROLE';
SET WAREHOUSE_NAME='<warehouse_name>';
SET DB_NAME='<database_name>';
SET SCHEMA_NAME='<schema_name>';
SET USER_NAME='<chalk_user>';

-- Create a role for Chalk (CREATE supports IDENTIFIER($var))
CREATE ROLE IF NOT EXISTS IDENTIFIER($ROLE_NAME);

-- Grant warehouse permissions
GRANT USAGE ON WAREHOUSE <WAREHOUSE_NAME> TO ROLE <ROLE_NAME>;

-- Grant database/schema permissions
GRANT USAGE ON DATABASE <DB_NAME> TO ROLE <ROLE_NAME>;
GRANT USAGE ON SCHEMA <DB_NAME>.<SCHEMA_NAME> TO ROLE <ROLE_NAME>;
GRANT SELECT ON ALL TABLES IN SCHEMA <DB_NAME>.<SCHEMA_NAME> TO ROLE <ROLE_NAME>;

-- Grant temporary table permissions
GRANT CREATE TEMPORARY TABLE ON SCHEMA <DB_NAME>.<SCHEMA_NAME> TO ROLE <ROLE_NAME>;

-- Assign role to user
GRANT ROLE <ROLE_NAME> TO USER <USER_NAME>;

Data Unloading

When Chalk runs a large query against Snowflake, pulling the result set back over the standard query connection can be slow. Instead, Chalk can ask Snowflake to unload the result directly into your cloud storage bucket (S3 or GCS) and then read those files in parallel. This is significantly faster for large feature computations.

To enable this, you set up three things, all of which are located in your cloud account and Snowflake account:

  1. A storage integration — a Snowflake object that holds the cloud identity Snowflake uses to write to your bucket. Using a storage integration with IAM authentication is the recommended approach because it avoids managing long-lived credentials.
  2. An external stage — a named Snowflake object that points at a specific location in your bucket and writes through the storage integration.
  3. A trust relationship on the cloud side (an AWS IAM role or a GCS service-account grant) that allows Snowflake’s identity to write objects into your bucket.

Once these exist, you point Chalk at the unload destination with environment variables (name of the stage and the unloading target, and optionally the storage integration when unloading directly to a cloud URI rather than a named stage) and Chalk will use this configuration to transfer large result sets efficiently, improving performance for large feature computations. A stage is bound to a single storage location and integration, so one stage cannot serve both S3 and GCS. The examples below therefore use provider-specific stage names (CHALK_UNLOAD_STAGE_S3 and CHALK_UNLOAD_STAGE_GCS); if you only use one cloud, name the stage however you like. Chalk reads from a single stage at a time, selected by the CHALK_SNOWFLAKE_UNLOAD_STAGE environment variable.

Setup Unloading with AWS

Unless otherwise noted, run the SQL snippets in this section against your Snowflake instance (as a user with the ACCOUNTADMIN role, since creating a storage integration requires it). The IAM steps are performed in your AWS account.

Create Storage Integration

Run against your Snowflake instance. This creates the storage integration object that Snowflake will use to authenticate to S3:

-- Set variables (customize these for your environment)
SET ROLE_NAME='CHALK_ROLE';
SET INTEGRATION_NAME='s3-integration-chalk-data-bucket';
SET AWS_ROLE_ARN='arn:aws:iam::<aws_account_id>:role/chalk-offline-store-access-role';
SET S3_BUCKET='s3://chalk-data-bucket/';

-- Create storage integration if it doesn't exist
CREATE STORAGE INTEGRATION IF NOT EXISTS IDENTIFIER($INTEGRATION_NAME)
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'S3'
  STORAGE_AWS_ROLE_ARN = $AWS_ROLE_ARN
  ENABLED = TRUE
  STORAGE_ALLOWED_LOCATIONS = ($S3_BUCKET);

-- Verify integration and get Snowflake credentials
DESCRIBE INTEGRATION IDENTIFIER($INTEGRATION_NAME);

From the DESCRIBE INTEGRATION output, record these values for Step 4:

  • STORAGE_AWS_IAM_USER_ARN - The Snowflake IAM user ARN
  • STORAGE_AWS_EXTERNAL_ID - The external ID for AWS trust policy

Create AWS IAM Role

This role will be assumed by Snowflake to access the S3 bucket.

Follow the Snowflake IAM Role documentation to create a role named chalk-{organization}-snow-s3-access-role.

When setting the trust policy, use a temporary placeholder with your AWS account ID and external ID "0000". You will update this with actual Snowflake credentials in Step 4.

Attach an IAM policy that grants S3 permissions to this role (see Snowflake documentation for policy details).

Create External Stage

Run against your Snowflake instance. The stage is the named object Chalk references when it unloads data; it points at a location inside your S3 bucket and writes through the storage integration created in Step 1:

-- Set variables (customize these for your environment)
SET STAGE_NAME='CHALK_UNLOAD_STAGE_S3';
SET INTEGRATION_NAME='s3-integration-chalk-data-bucket';
SET S3_URL='s3://chalk-data-bucket/unload/';

-- Create external stage using storage integration
CREATE STAGE IF NOT EXISTS IDENTIFIER($STAGE_NAME)
  STORAGE_INTEGRATION = IDENTIFIER($INTEGRATION_NAME)
  URL = $S3_URL;

Update IAM Role Trust Policy

Update the trust policy for the AWS IAM role chalk-{organization}-snow-s3-access-role with the actual Snowflake credentials from Step 1:

  • Replace the Principal’s AWS ARN with the STORAGE_AWS_IAM_USER_ARN value
  • Replace the External ID with the STORAGE_AWS_EXTERNAL_ID value

See the Snowflake trust policy documentation for detailed instructions.

Grant Permissions

Run against your Snowflake instance. This grants the Chalk role access to the storage integration and stage so it can perform the unload):

-- Variables for reference (example values; use the same names as the earlier steps)
SET ROLE_NAME='CHALK_ROLE';
SET INTEGRATION_NAME='s3-integration-chalk-data-bucket';
SET STAGE_NAME='CHALK_UNLOAD_STAGE_S3';

-- Grant usage on storage integration
GRANT USAGE ON INTEGRATION "<INTEGRATION_NAME>" TO ROLE <ROLE_NAME>;

-- Grant stage permissions
GRANT USAGE ON STAGE <STAGE_NAME> TO ROLE <ROLE_NAME>;
GRANT READ, WRITE ON STAGE <STAGE_NAME> TO ROLE <ROLE_NAME>;

Validate

You can optionally validate the integration end-to-end using

SELECT SYSTEM$VALIDATE_STORAGE_INTEGRATION('s3-integration-chalk-data-bucket', 's3://chalk-data-bucket/unload/', 'test.csv', 'WRITE');

After completing these steps, continue to Configure Chalk Environment Variables.


Setup Unloading with GCS

Run the SQL snippets against your Snowflake instance (as a user with the ACCOUNTADMIN role); the IAM steps are performed in your Google Cloud project.

For background, see the Snowflake documentation:

Create Storage Integration

Run against your Snowflake instance. This creates the storage integration object that Snowflake will use to authenticate to GCS:

-- Set variables (customize these for your environment)
SET INTEGRATION_NAME='gcs-integration-chalk-data-bucket';
SET GCS_BUCKET='gcs://chalk-data-bucket/';

-- Create storage integration if it doesn't exist
CREATE STORAGE INTEGRATION IF NOT EXISTS IDENTIFIER($INTEGRATION_NAME)
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'GCS'
  ENABLED = TRUE
  STORAGE_ALLOWED_LOCATIONS = ($GCS_BUCKET);

-- Verify integration and retrieve the Snowflake-managed service account
DESCRIBE INTEGRATION IDENTIFIER($INTEGRATION_NAME);

Note: GCS bucket URIs use the gcs:// scheme in Snowflake SQL (not gs://).

From the DESCRIBE INTEGRATION output, record the STORAGE_GCP_SERVICE_ACCOUNT value — an email like service-account-id@project.iam.gserviceaccount.com. You will grant this service account access to your bucket in the next step.

Grant the Service Account Access to Your GCS Bucket

These steps are performed in your Google Cloud project, using the service account email from Step 1.

  1. In the Google Cloud Console, go to IAM & Admin → Roles and Create Role with the permissions required for unloading (read and write):

    • storage.buckets.get
    • storage.objects.create
    • storage.objects.delete
    • storage.objects.get
    • storage.objects.list

    Alternatively, you can assign the predefined Storage Object Admin (roles/storage.objectAdmin) role instead of a custom role.

  2. Go to Cloud Storage → Buckets and select your bucket.

  3. Open Permissions → Grant access, paste the STORAGE_GCP_SERVICE_ACCOUNT email as the principal, and assign the role from the previous step.

  4. If your bucket is encrypted with Cloud KMS, also grant the service account the Cloud KMS CryptoKey Encryptor/Decryptor role on the relevant key.

Note: If your organization enforces a domain-restricted sharing policy, you may need to update the org policy to allow Snowflake’s service-account domain. See the Snowflake GCS configuration docs for details.

Create External Stage

Run against your Snowflake instance. The stage points at a location inside your GCS bucket and writes through the storage integration created in Step 1:

-- Set variables (customize these for your environment)
SET STAGE_NAME='CHALK_UNLOAD_STAGE_GCS';
SET INTEGRATION_NAME='gcs-integration-chalk-data-bucket';
SET GCS_URL='gcs://chalk-data-bucket/unload/';

-- Create external stage using storage integration
CREATE STAGE IF NOT EXISTS IDENTIFIER($STAGE_NAME)
  STORAGE_INTEGRATION = IDENTIFIER($INTEGRATION_NAME)
  URL = $GCS_URL;

Grant Permissions

Run against your Snowflake instance. This grants the Chalk role access to the storage integration and stage so it can perform the unload:

-- Variables for reference (example values; use the same names as the earlier steps)
SET ROLE_NAME='CHALK_ROLE';
SET INTEGRATION_NAME='gcs-integration-chalk-data-bucket';
SET STAGE_NAME='CHALK_UNLOAD_STAGE_GCS';

-- Grant usage on storage integration
GRANT USAGE ON INTEGRATION <INTEGRATION_NAME> TO ROLE <ROLE_NAME>;

-- Grant stage permissions
GRANT USAGE ON STAGE <STAGE_NAME> TO ROLE <ROLE_NAME>;
GRANT READ, WRITE ON STAGE <STAGE_NAME> TO ROLE <ROLE_NAME>;

Validate

You can optionally validate the integration end-to-end using

SELECT SYSTEM$VALIDATE_STORAGE_INTEGRATION('gcs-integration-chalk-data-bucket', 'gcs://chalk-data-bucket/unload/', 'test.csv', 'WRITE');

Configure Chalk environment

You can configure the Snowflake-specific options using the SnowflakeSource.__init__ args. Alternately, you can configure the source through your dashboard.

Dashboard

Once the storage integration and stage exist (for either AWS or GCS), in Chalk dashboard proceed to Integrations > Data Sources > Add a data source, select Snowflake, add and configure a new Snowflake datasource.

Snowflake Data Dource form in the Chalk dashboard

Data Unloading

If you configure data unloading, you should specify the configured Unload Stage Name and Unload External Location (storage URI).

Configure Data Unloading using environment variables (Legacy)

You can direct Chalk which stage to unload to for all Snowflake integrations in the environment. In the Chalk dashboard, go to Settings > Variables and add:

  • CHALK_SNOWFLAKE_UNLOAD_STAGE is the Snowflake stage object you created, e.g. @MY_DB.PUBLIC.CHALK_UNLOAD_STAGE_S3 or @MY_DB.PUBLIC.CHALK_UNLOAD_STAGE_GCS. Point this at the stage for the cloud you want Chalk to unload to.
  • CHALK_SNOWFLAKE_UNLOAD_URI is the cloud storage URI the stage points to, e.g. s3://chalk-data-bucket/unload/ or gcs://chalk-data-bucket/unload/. It must match the URL you set on that stage.
  • CHALK_SNOWFLAKE_STORAGE_INTEGRATION (optional) is the storage integration name. This is only used when CHALK_SNOWFLAKE_UNLOAD_STAGE is set to a cloud URI (e.g. s3://... / gcs://...) directly rather than a named stage.

SDK Setup

If you configured your Snowflake integration in the dashboard, define your SnowflakeSource data sources in Python referencing the configured names:

from chalk.sql import SnowflakeSource

risk = SnowflakeSource(name="RISK")
marketing = SnowflakeSource(name="MARKETING")

Then reference them in SQL file resolvers using the name parameter. For example, to query from the RISK source:

-- type: online
-- resolves: User
-- source: RISK
SELECT id, credit_score FROM users

And to query from the MARKETING source:

-- type: online
-- resolves: User
-- source: MARKETING
SELECT id, email, campaign_status FROM users
Named integrations inject environment variables with the standard names prefixed by the integration name. For example, if your integration is called RISK, then the variable SNOWSQL_SCHEMA will be injected as RISK_SNOWSQL_SCHEMA.