# Snowflake
source: https://docs.chalk.ai/docs/snowflake

## Integrate with Snowflake.

Chalk supports Snowflake as a SQL source.

### Prerequisites

If Chalk has helped you configure a Snowflake Offline Store already, then some of these commands may overlap with work that is already done. Please review your Snowflake configuration for existing objects matching the ones below before proceeding.

For detailed information about Snowflake storage integrations and external stages, refer to:

- CREATE STORAGE INTEGRATION - Configure IAM-based authentication for S3 access
- CREATE STAGE - Create external stages using storage integrations

### Permissions

To use Snowflake as a Chalk datasource, the user or role specified in your integration must have the following permissions:

- USAGE on the warehouse, database, schema, and stage
- SELECT on all tables and views being queried
- READ on the stage
- WRITE on the stage
- CREATE TEMPORARY TABLE in the schema
- DROP on temporary tables in the schema

Chalk creates temporary tables during feature computation to push down filters to optimize your query loads. In addition,
Chalk uses the stages to optimize data unloading.

### RBAC Setup

Run against your Snowflake instance as a user with the ACCOUNTADMIN role (or a
role with the privileges to create roles and grant the permissions below)
to set up a role with the appropriate permissions:

```
-- Set variables (example values shown; customize for your environment)
SET ROLE_NAME='CHALK_ROLE';
SET WAREHOUSE_NAME='<warehouse_name>';
SET DB_NAME='<database_name>';
SET SCHEMA_NAME='<schema_name>';
SET USER_NAME='<chalk_user>';

-- Create a role for Chalk (CREATE supports IDENTIFIER($var))
CREATE ROLE IF NOT EXISTS IDENTIFIER($ROLE_NAME);

-- Grant warehouse permissions
GRANT USAGE ON WAREHOUSE <WAREHOUSE_NAME> TO ROLE <ROLE_NAME>;

-- Grant database/schema permissions
GRANT USAGE ON DATABASE <DB_NAME> TO ROLE <ROLE_NAME>;
GRANT USAGE ON SCHEMA <DB_NAME>.<SCHEMA_NAME> TO ROLE <ROLE_NAME>;
GRANT SELECT ON ALL TABLES IN SCHEMA <DB_NAME>.<SCHEMA_NAME> TO ROLE <ROLE_NAME>;

-- Grant temporary table permissions
GRANT CREATE TEMPORARY TABLE ON SCHEMA <DB_NAME>.<SCHEMA_NAME> TO ROLE <ROLE_NAME>;

-- Assign role to user
GRANT ROLE <ROLE_NAME> TO USER <USER_NAME>;
```

### Data Unloading

When Chalk runs a large query against Snowflake, pulling the result set back over the
standard query connection can be slow. Instead, Chalk can ask Snowflake to
unload the
result directly into your cloud storage bucket (S3 or GCS) and then read those files in
parallel. This is significantly faster for large feature computations.

To enable this, you set up three things, all of which are located in your cloud account and
Snowflake account:

- A storage integration — a Snowflake object that holds the cloud identity Snowflake
uses to write to your bucket. Using a storage integration with IAM authentication is the
recommended approach because it avoids managing long-lived credentials.
- An external stage — a named Snowflake object that points at a specific location in
your bucket and writes through the storage integration.
- A trust relationship on the cloud side (an AWS IAM role or a GCS service-account
grant) that allows Snowflake's identity to write objects into your bucket.

Once these exist, you point Chalk at the unload destination with environment variables
(name of the stage and the unloading target, and optionally the storage integration when
unloading directly to a cloud URI rather than a named stage) and Chalk will use this
configuration to transfer large result sets efficiently, improving performance for large feature computations.
A stage is bound to a single storage location and integration, so one stage cannot serve
both S3 and GCS. The examples below therefore use provider-specific stage names
(CHALK_UNLOAD_STAGE_S3 and CHALK_UNLOAD_STAGE_GCS); if you only use one cloud, name the
stage however you like. Chalk reads from a single stage at a time, selected by the
CHALK_SNOWFLAKE_UNLOAD_STAGE environment variable.

### Setup Unloading with AWS

Unless otherwise noted, run the SQL snippets in this section against your Snowflake
instance (as a user with the ACCOUNTADMIN role, since creating a storage integration
requires it). The IAM steps are performed in your AWS account.

### Create Storage Integration

Run against your Snowflake instance. This creates the storage integration object that
Snowflake will use to authenticate to S3:

```
-- Set variables (customize these for your environment)
SET ROLE_NAME='CHALK_ROLE';
SET INTEGRATION_NAME='s3-integration-chalk-data-bucket';
SET AWS_ROLE_ARN='arn:aws:iam::<aws_account_id>:role/chalk-offline-store-access-role';
SET S3_BUCKET='s3://chalk-data-bucket/';

-- Create storage integration if it doesn't exist
CREATE STORAGE INTEGRATION IF NOT EXISTS IDENTIFIER($INTEGRATION_NAME)
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'S3'
  STORAGE_AWS_ROLE_ARN = $AWS_ROLE_ARN
  ENABLED = TRUE
  STORAGE_ALLOWED_LOCATIONS = ($S3_BUCKET);

-- Verify integration and get Snowflake credentials
DESCRIBE INTEGRATION IDENTIFIER($INTEGRATION_NAME);
```

From the DESCRIBE INTEGRATION output, record these values for Step 4:

- STORAGE_AWS_IAM_USER_ARN - The Snowflake IAM user ARN
- STORAGE_AWS_EXTERNAL_ID - The external ID for AWS trust policy

### Create AWS IAM Role

This role will be assumed by Snowflake to access the S3 bucket.

Follow the Snowflake IAM Role documentation
to create a role named chalk-{organization}-snow-s3-access-role.

When setting the trust policy, use a temporary placeholder with your AWS account ID and external ID "0000". You will update this with actual Snowflake credentials in Step 4.

Attach an IAM policy that grants S3 permissions to this role (see Snowflake documentation for policy details).

### Create External Stage

Run against your Snowflake instance. The stage is the named object Chalk references when
it unloads data; it points at a location inside your S3 bucket and writes through the
storage integration created in Step 1:

```
-- Set variables (customize these for your environment)
SET STAGE_NAME='CHALK_UNLOAD_STAGE_S3';
SET INTEGRATION_NAME='s3-integration-chalk-data-bucket';
SET S3_URL='s3://chalk-data-bucket/unload/';

-- Create external stage using storage integration
CREATE STAGE IF NOT EXISTS IDENTIFIER($STAGE_NAME)
  STORAGE_INTEGRATION = IDENTIFIER($INTEGRATION_NAME)
  URL = $S3_URL;
```

### Update IAM Role Trust Policy

Update the trust policy for the AWS IAM role chalk-{organization}-snow-s3-access-role with the actual Snowflake credentials from Step 1:

- Replace the Principal's AWS ARN with the STORAGE_AWS_IAM_USER_ARN value
- Replace the External ID with the STORAGE_AWS_EXTERNAL_ID value

See the Snowflake trust policy documentation for detailed instructions.

### Grant Permissions

Run against your Snowflake instance. This grants the Chalk role access to the storage
integration and stage so it can perform the unload):

```
-- Variables for reference (example values; use the same names as the earlier steps)
SET ROLE_NAME='CHALK_ROLE';
SET INTEGRATION_NAME='s3-integration-chalk-data-bucket';
SET STAGE_NAME='CHALK_UNLOAD_STAGE_S3';

-- Grant usage on storage integration
GRANT USAGE ON INTEGRATION "<INTEGRATION_NAME>" TO ROLE <ROLE_NAME>;

-- Grant stage permissions
GRANT USAGE ON STAGE <STAGE_NAME> TO ROLE <ROLE_NAME>;
GRANT READ, WRITE ON STAGE <STAGE_NAME> TO ROLE <ROLE_NAME>;
```

### Validate

You can optionally validate the integration end-to-end using

```
SELECT SYSTEM$VALIDATE_STORAGE_INTEGRATION('s3-integration-chalk-data-bucket', 's3://chalk-data-bucket/unload/', 'test.csv', 'WRITE');
```

After completing these steps, continue to
Configure Chalk Environment Variables.

### Setup Unloading with GCS

Run the SQL snippets against your Snowflake instance (as a user with the ACCOUNTADMIN
role); the IAM steps are performed in your Google Cloud project.

For background, see the Snowflake documentation:

- Unloading into Google Cloud Storage
- Configuring a Snowflake storage integration for GCS

### Create Storage Integration

Run against your Snowflake instance. This creates the storage integration object that
Snowflake will use to authenticate to GCS:

```
-- Set variables (customize these for your environment)
SET INTEGRATION_NAME='gcs-integration-chalk-data-bucket';
SET GCS_BUCKET='gcs://chalk-data-bucket/';

-- Create storage integration if it doesn't exist
CREATE STORAGE INTEGRATION IF NOT EXISTS IDENTIFIER($INTEGRATION_NAME)
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'GCS'
  ENABLED = TRUE
  STORAGE_ALLOWED_LOCATIONS = ($GCS_BUCKET);

-- Verify integration and retrieve the Snowflake-managed service account
DESCRIBE INTEGRATION IDENTIFIER($INTEGRATION_NAME);
```

From the DESCRIBE INTEGRATION output, record the STORAGE_GCP_SERVICE_ACCOUNT value — an
email like service-account-id@project.iam.gserviceaccount.com. You will grant this
service account access to your bucket in the next step.

### Grant the Service Account Access to Your GCS Bucket

These steps are performed in your Google Cloud project, using the service account email
from Step 1.

- In the Google Cloud Console, go to IAM & Admin → Roles and Create Role with the
permissions required for unloading (read and write):storage.buckets.getstorage.objects.createstorage.objects.deletestorage.objects.getstorage.objects.listAlternatively, you can assign the predefined Storage Object Admin
(roles/storage.objectAdmin) role instead of a custom role.
- Go to Cloud Storage → Buckets and select your bucket.
- Open Permissions → Grant access, paste the STORAGE_GCP_SERVICE_ACCOUNT email as the
principal, and assign the role from the previous step.
- If your bucket is encrypted with Cloud KMS, also grant the service account the
Cloud KMS CryptoKey Encryptor/Decryptor role on the relevant key.

### Create External Stage

Run against your Snowflake instance. The stage points at a location inside your GCS
bucket and writes through the storage integration created in Step 1:

```
-- Set variables (customize these for your environment)
SET STAGE_NAME='CHALK_UNLOAD_STAGE_GCS';
SET INTEGRATION_NAME='gcs-integration-chalk-data-bucket';
SET GCS_URL='gcs://chalk-data-bucket/unload/';

-- Create external stage using storage integration
CREATE STAGE IF NOT EXISTS IDENTIFIER($STAGE_NAME)
  STORAGE_INTEGRATION = IDENTIFIER($INTEGRATION_NAME)
  URL = $GCS_URL;
```

### Grant Permissions

Run against your Snowflake instance. This grants the Chalk role access to the storage
integration and stage so it can perform the unload:

```
-- Variables for reference (example values; use the same names as the earlier steps)
SET ROLE_NAME='CHALK_ROLE';
SET INTEGRATION_NAME='gcs-integration-chalk-data-bucket';
SET STAGE_NAME='CHALK_UNLOAD_STAGE_GCS';

-- Grant usage on storage integration
GRANT USAGE ON INTEGRATION <INTEGRATION_NAME> TO ROLE <ROLE_NAME>;

-- Grant stage permissions
GRANT USAGE ON STAGE <STAGE_NAME> TO ROLE <ROLE_NAME>;
GRANT READ, WRITE ON STAGE <STAGE_NAME> TO ROLE <ROLE_NAME>;
```

### Validate

You can optionally validate the integration end-to-end using

```
SELECT SYSTEM$VALIDATE_STORAGE_INTEGRATION('gcs-integration-chalk-data-bucket', 'gcs://chalk-data-bucket/unload/', 'test.csv', 'WRITE');
```

### Configure Chalk environment

You can configure the Snowflake-specific options using the SnowflakeSource.__init__ args.
Alternately, you can configure the source through your dashboard.

### Dashboard

Once the storage integration and stage exist (for either AWS or GCS),
in Chalk dashboard proceed to Integrations > Data Sources > Add a data source, select Snowflake,
add and configure a new Snowflake datasource.

Snowflake Data Dource form in the Chalk dashboard

### Data Unloading

If you configure data unloading, you should specify the configured
Unload Stage Name and Unload External Location (storage URI).

### Configure Data Unloading using environment variables (Legacy)

You can direct Chalk which stage to unload to for all Snowflake integrations in the environment.
In the Chalk dashboard, go to Settings > Variables and add:

- CHALK_SNOWFLAKE_UNLOAD_STAGE is the Snowflake stage object you created, e.g. @MY_DB.PUBLIC.CHALK_UNLOAD_STAGE_S3 or @MY_DB.PUBLIC.CHALK_UNLOAD_STAGE_GCS. Point this at the stage for the cloud you want Chalk to unload to.
- CHALK_SNOWFLAKE_UNLOAD_URI is the cloud storage URI the stage points to, e.g. s3://chalk-data-bucket/unload/ or gcs://chalk-data-bucket/unload/. It must match the URL you set on that stage.
- CHALK_SNOWFLAKE_STORAGE_INTEGRATION (optional) is the storage integration name. This is only used when CHALK_SNOWFLAKE_UNLOAD_STAGE is set to a cloud URI (e.g. s3://... / gcs://...) directly rather than a named stage.

### SDK Setup

If you configured your Snowflake integration in the dashboard,
define your SnowflakeSource data sources in Python referencing the configured names:

```
from chalk.sql import SnowflakeSource

risk = SnowflakeSource(name="RISK")
marketing = SnowflakeSource(name="MARKETING")
```

Then reference them in SQL file resolvers using the name parameter. For example, to query from the RISK source:

```
-- type: online
-- resolves: User
-- source: RISK
SELECT id, credit_score FROM users
```

And to query from the MARKETING source:

```
-- type: online
-- resolves: User
-- source: MARKETING
SELECT id, email, campaign_status FROM users
```