Snowflake - Chalk

Chalk supports Snowflake as a SQL source.

Prerequisites

If Chalk has helped you configure a Snowflake Offline Store already, then some of these commands may overlap with work that is already done. Please review your Snowflake configuration for existing objects matching the ones below before proceeding.

For detailed information about Snowflake storage integrations and external stages, refer to:

CREATE STORAGE INTEGRATION - Configure IAM-based authentication for S3 access
CREATE STAGE - Create external stages using storage integrations

Permissions

To use Snowflake as a Chalk datasource, the user or role specified in your integration must have the following permissions:

USAGE on the warehouse, database, schema, and stage
SELECT on all tables and views being queried
READ on the stage
WRITE on the stage
CREATE TEMPORARY TABLE in the schema
DROP on temporary tables in the schema

Chalk creates temporary tables during feature computation to push down filters to optimize your query loads. In addition, Chalk uses the stages to optimize data unloading.

RBAC Setup

Run against your Snowflake instance as a user with the ACCOUNTADMIN role (or a role with the privileges to create roles and grant the permissions below) to set up a role with the appropriate permissions:

-- Set variables (example values shown; customize for your environment)
SET ROLE_NAME='CHALK_ROLE';
SET WAREHOUSE_NAME='<warehouse_name>';
SET DB_NAME='<database_name>';
SET SCHEMA_NAME='<schema_name>';
SET USER_NAME='<chalk_user>';

-- Create a role for Chalk (CREATE supports IDENTIFIER($var))
CREATE ROLE IF NOT EXISTS IDENTIFIER($ROLE_NAME);

-- Grant warehouse permissions
GRANT USAGE ON WAREHOUSE <WAREHOUSE_NAME> TO ROLE <ROLE_NAME>;

-- Grant database/schema permissions
GRANT USAGE ON DATABASE <DB_NAME> TO ROLE <ROLE_NAME>;
GRANT USAGE ON SCHEMA <DB_NAME>.<SCHEMA_NAME> TO ROLE <ROLE_NAME>;
GRANT SELECT ON ALL TABLES IN SCHEMA <DB_NAME>.<SCHEMA_NAME> TO ROLE <ROLE_NAME>;

-- Grant temporary table permissions
GRANT CREATE TEMPORARY TABLE ON SCHEMA <DB_NAME>.<SCHEMA_NAME> TO ROLE <ROLE_NAME>;

-- Assign role to user
GRANT ROLE <ROLE_NAME> TO USER <USER_NAME>;

Data Unloading

When Chalk runs a large query against Snowflake, pulling the result set back over the standard query connection can be slow. Instead, Chalk can ask Snowflake to unload the result directly into your cloud storage bucket (S3 or GCS) and then read those files in parallel. This is significantly faster for large feature computations.

To enable this, you set up three things, all of which are located in your cloud account and Snowflake account:

A storage integration — a Snowflake object that holds the cloud identity Snowflake uses to write to your bucket. Using a storage integration with IAM authentication is the recommended approach because it avoids managing long-lived credentials.
An external stage — a named Snowflake object that points at a specific location in your bucket and writes through the storage integration.
A trust relationship on the cloud side (an AWS IAM role or a GCS service-account grant) that allows Snowflake’s identity to write objects into your bucket.

Once these exist, configure the unload destination from the Data unload tab on the Snowflake data source page. Set the Unload target, then provide the matching Location, Integration name, and Data unload location values. Chalk will use this configuration to transfer large result sets efficiently, improving performance for large feature computations. A stage is bound to a single storage location and integration, so one stage cannot serve both S3 and GCS. The examples below therefore use provider-specific stage names (CHALK_UNLOAD_STAGE_S3 and CHALK_UNLOAD_STAGE_GCS); if you only use one cloud, name the stage however you like. Choose the stage that matches the cloud storage location you want the Snowflake data source to unload to.

Setup Unloading with AWS

Unless otherwise noted, run the SQL snippets in this section against your Snowflake instance (as a user with the ACCOUNTADMIN role, since creating a storage integration requires it). The IAM steps are performed in your AWS account.

Step 1: Create AWS IAM Role

This role will be assumed by Snowflake to access the S3 bucket.

Follow the Snowflake IAM Role documentation to create a role named chalk-{organization}-snow-s3-access-role.

When setting the trust policy, use a temporary placeholder with your AWS account ID and external ID "0000". You will update this with actual Snowflake credentials in Step 3, once Snowflake’s identity is known.

Attach an IAM policy that grants S3 permissions to this role, scoped to the bucket/prefix you’ll unload to:

Note: s3:ListBucket must target the bucket ARN (arn:aws:s3:::chalk-data-bucket), not the object/prefix ARN (arn:aws:s3:::chalk-data-bucket/unload/*). Scoping it to the object ARN produces anAccessDenied error on s3:ListBucket even though the other actions succeed.

Step 2: Create Storage Integration

Run against your Snowflake instance. This creates the storage integration object that Snowflake will use to authenticate to S3, referencing the real ARN of the role you just created:

-- Set variables (customize these for your environment)
SET ROLE_NAME='CHALK_ROLE';
SET INTEGRATION_NAME='S3_INTEGRATION_CHALK_DATA_BUCKET';
SET AWS_ROLE_ARN='arn:aws:iam::<aws_account_id>:role/chalk-offline-store-access-role';
SET S3_BUCKET='s3://chalk-data-bucket/';

-- Create storage integration if it doesn't exist
CREATE STORAGE INTEGRATION IF NOT EXISTS IDENTIFIER($INTEGRATION_NAME)
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'S3'
  STORAGE_AWS_ROLE_ARN = $AWS_ROLE_ARN
  ENABLED = TRUE
  STORAGE_ALLOWED_LOCATIONS = ($S3_BUCKET);

-- Verify integration and get Snowflake credentials
DESCRIBE INTEGRATION IDENTIFIER($INTEGRATION_NAME);

Use underscores rather than hyphens in INTEGRATION_NAME.

From the DESCRIBE INTEGRATION output, record these values for Step 3:

STORAGE_AWS_IAM_USER_ARN - The Snowflake IAM user ARN
STORAGE_AWS_EXTERNAL_ID - The external ID for AWS trust policy

Step 3: Update IAM Role Trust Policy

Update the trust policy for the AWS IAM role you created in Step 1 (chalk-{organization}-snow-s3-access-role) with the actual Snowflake credentials from Step 2:

Replace the Principal’s AWS ARN with the STORAGE_AWS_IAM_USER_ARN value
Replace the External ID with the STORAGE_AWS_EXTERNAL_ID value

See the Snowflake trust policy documentation for detailed instructions.

Step 4: Create External Stage

Run against your Snowflake instance. The stage is the named object Chalk references when it unloads data; it points at a location inside your S3 bucket and writes through the storage integration created in Step 2.

-- Set variables (customize these for your environment)
SET STAGE_NAME='CHALK_UNLOAD_STAGE_S3';
SET S3_URL='s3://chalk-data-bucket/unload/';

-- Use the integration previously created (not a string)
CREATE STAGE IF NOT EXISTS IDENTIFIER($STAGE_NAME)
  STORAGE_INTEGRATION = <INTEGRATION_NAME>
  URL = $S3_URL;

If your database or schema name contains characters other than letters, digits, or underscores (e.g. my-database), it's a quoted, case-sensitive identifier. Reference it with quotes preserved wherever you use a fully-qualified name, e.g. `"my-database".PUBLIC.CHALK_UNLOAD_STAGE_S3`. Also make sure your session has a current database/schema set (`USE DATABASE` / `USE SCHEMA`) before running stage operations, or use a fully-qualified stage name.

Step 5: Grant Permissions

Run against your Snowflake instance. This grants the Chalk role access to the storage integration and stage so it can perform the unload:

-- Variables for reference (example values; use the same names as the earlier steps)
SET ROLE_NAME='CHALK_ROLE';
SET INTEGRATION_NAME='S3_INTEGRATION_CHALK_DATA_BUCKET';
SET STAGE_NAME='CHALK_UNLOAD_STAGE_S3';

-- Grant usage on storage integration
GRANT USAGE ON INTEGRATION <INTEGRATION_NAME> TO ROLE <ROLE_NAME>;

-- Grant stage permissions
GRANT USAGE ON STAGE <STAGE_NAME> TO ROLE <ROLE_NAME>;
GRANT READ, WRITE ON STAGE <STAGE_NAME> TO ROLE <ROLE_NAME>;

Step 6: Validate

You can optionally validate the integration end-to-end using

SELECT SYSTEM$VALIDATE_STORAGE_INTEGRATION('S3_INTEGRATION_CHALK_DATA_BUCKET', 's3://chalk-data-bucket/unload/', 'test.csv', 'WRITE');

After completing these steps, continue to Configure Chalk Configuration Variables.

Setup Unloading with GCS

Run the SQL snippets against your Snowflake instance (as a user with the ACCOUNTADMIN role); the IAM steps are performed in your Google Cloud project.

For background, see the Snowflake documentation:

Create Storage Integration

Run against your Snowflake instance. This creates the storage integration object that Snowflake will use to authenticate to GCS:

-- Set variables (customize these for your environment)
SET INTEGRATION_NAME='GCS_INTEGRATION_CHALK_DATA_BUCKET';
SET GCS_BUCKET='gcs://chalk-data-bucket/';

-- Create storage integration if it doesn't exist
CREATE STORAGE INTEGRATION IF NOT EXISTS IDENTIFIER($INTEGRATION_NAME)
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'GCS'
  ENABLED = TRUE
  STORAGE_ALLOWED_LOCATIONS = ($GCS_BUCKET);

-- Verify integration and retrieve the Snowflake-managed service account
DESCRIBE INTEGRATION IDENTIFIER($INTEGRATION_NAME);

Use underscores rather than hyphens in INTEGRATION_NAME.

GCS bucket URIs use the gcs:// scheme in Snowflake SQL (not gs://).

From the DESCRIBE INTEGRATION output, record the STORAGE_GCP_SERVICE_ACCOUNT value — an email like service-account-id@project.iam.gserviceaccount.com. You will grant this service account access to your bucket in the next step.

Grant the Service Account Access to Your GCS Bucket

These steps are performed in your Google Cloud project, using the service account email from Step 1.

In the Google Cloud Console, go to IAM & Admin → Roles and Create Role with the permissions required for unloading (read and write):
- storage.buckets.get
- storage.objects.create
- storage.objects.delete
- storage.objects.get
- storage.objects.list
Alternatively, you can assign the predefined Storage Object Admin (roles/storage.objectAdmin) role instead of a custom role.
Go to Cloud Storage → Buckets and select your bucket.
Open Permissions → Grant access, paste the STORAGE_GCP_SERVICE_ACCOUNT email as the principal, and assign the role from the previous step.
If your bucket is encrypted with Cloud KMS, also grant the service account the Cloud KMS CryptoKey Encryptor/Decryptor role on the relevant key.

If your organization enforces a domain-restricted sharing policy, you may need to update the org policy to allow Snowflake's service-account domain. See the [Snowflake GCS configuration docs](https://docs.snowflake.com/en/user-guide/data-load-gcs-config) for details.

Create External Stage

Run against your Snowflake instance. The stage points at a location inside your GCS bucket and writes through the storage integration created in Step 1:

-- Set variables (customize these for your environment)
SET STAGE_NAME='CHALK_UNLOAD_STAGE_GCS';
SET INTEGRATION_NAME='GCS_INTEGRATION_CHALK_DATA_BUCKET';
SET GCS_URL='gcs://chalk-data-bucket/unload/';

-- Create external stage using storage integration
CREATE STAGE IF NOT EXISTS IDENTIFIER($STAGE_NAME)
  STORAGE_INTEGRATION = <INTEGRATION_NAME>
  URL = $GCS_URL;

Also make sure your session has a current database/schema set (USE DATABASE / USE SCHEMA) before running stage operations, or use a fully-qualified stage name.

Grant Permissions

Run against your Snowflake instance. This grants the Chalk role access to the storage integration and stage so it can perform the unload:

-- Variables for reference (example values; use the same names as the earlier steps)
SET ROLE_NAME='CHALK_ROLE';
SET INTEGRATION_NAME='GCS_INTEGRATION_CHALK_DATA_BUCKET';
SET STAGE_NAME='CHALK_UNLOAD_STAGE_GCS';

-- Grant usage on storage integration
GRANT USAGE ON INTEGRATION <INTEGRATION_NAME> TO ROLE <ROLE_NAME>;

-- Grant stage permissions
GRANT USAGE ON STAGE <STAGE_NAME> TO ROLE <ROLE_NAME>;
GRANT READ, WRITE ON STAGE <STAGE_NAME> TO ROLE <ROLE_NAME>;

Validate

You can optionally validate the integration end-to-end using

SELECT SYSTEM$VALIDATE_STORAGE_INTEGRATION('GCS_INTEGRATION_CHALK_DATA_BUCKET', 'gcs://chalk-data-bucket/unload/', 'test.csv', 'WRITE');

Configure Chalk environment

You can configure the Snowflake-specific options using the SnowflakeSource.__init__ args. Alternately, you can configure the source through your dashboard.

Dashboard

Once the storage integration and stage exist (for either AWS or GCS), in Chalk dashboard proceed to Integrations > Data Sources > Add a data source, select Snowflake, add and configure a new Snowflake datasource.

Snowflake Data Dource form in the Chalk dashboard

Data Unloading

If you configure data unloading, open the Data unload tab on the Snowflake data source page and set the Unload target. For URL-based unloading, provide the configured Location, Integration name, and Data unload location values.

Configure Data Unloading using engine configuration variables (Legacy)

You can direct Chalk which stage to unload to for all Snowflake integrations in the environment. In the Chalk dashboard, go to Integrations > Config Variables and add:

CHALK_SNOWFLAKE_UNLOAD_STAGE is the Snowflake stage object you created, e.g. @MY_DB.PUBLIC.CHALK_UNLOAD_STAGE_S3 or @MY_DB.PUBLIC.CHALK_UNLOAD_STAGE_GCS. Point this at the stage for the cloud you want Chalk to unload to.
CHALK_SNOWFLAKE_UNLOAD_URI is the cloud storage URI the stage points to, e.g. s3://chalk-data-bucket/unload/ or gcs://chalk-data-bucket/unload/. It must match the URL you set on that stage.
CHALK_SNOWFLAKE_STORAGE_INTEGRATION (optional) is the storage integration name. This is only used when CHALK_SNOWFLAKE_UNLOAD_STAGE is set to a cloud URI (e.g. s3://... / gcs://...) directly rather than a named stage.

SDK Setup

If you configured your Snowflake integration in the dashboard, define your SnowflakeSource data sources in Python referencing the configured names:

from chalk.sql import SnowflakeSource

risk = SnowflakeSource(name="RISK")
marketing = SnowflakeSource(name="MARKETING")

Then reference them in SQL file resolvers using the name parameter. For example, to query from the RISK source:

-- type: online
-- resolves: User
-- source: RISK
SELECT id, credit_score FROM users

And to query from the MARKETING source:

-- type: online
-- resolves: User
-- source: MARKETING
SELECT id, email, campaign_status FROM users

Named integrations inject environment variables with the standard names prefixed by the integration name. For example, if your integration is called RISK, then the variable SNOWSQL_SCHEMA will be injected as RISK_SNOWSQL_SCHEMA.

​Prerequisites

​Permissions

​RBAC Setup

​Data Unloading

​Setup Unloading with AWS

​Step 1: Create AWS IAM Role

​Step 2: Create Storage Integration

​Step 3: Update IAM Role Trust Policy

​Step 4: Create External Stage

​Step 5: Grant Permissions

​Step 6: Validate

​Setup Unloading with GCS

​Create Storage Integration

​Grant the Service Account Access to Your GCS Bucket

​Create External Stage

​Grant Permissions

​Validate

​Configure Chalk environment

​Dashboard

​Data Unloading

​Configure Data Unloading using engine configuration variables (Legacy)

​SDK Setup

On this page

Prerequisites

Permissions

RBAC Setup

Data Unloading

Setup Unloading with AWS

Step 1: Create AWS IAM Role

Step 2: Create Storage Integration

Step 3: Update IAM Role Trust Policy

Step 4: Create External Stage

Step 5: Grant Permissions

Step 6: Validate

Setup Unloading with GCS

Create Storage Integration

Grant the Service Account Access to Your GCS Bucket

Create External Stage

Grant Permissions

Validate

Configure Chalk environment

Dashboard

Data Unloading

Configure Data Unloading using engine configuration variables (Legacy)

SDK Setup