# How do I use Chalk?
source: https://docs.chalk.ai/docs/setup-guide

## Customizing your Chalk solution.

Chalk is a feature store that enables data engineers and data scientists on production machine learning teams to
collaborate efficiently and effectively.

In this tutorial, we will guide you step by step through customizing your Chalk solution to help you achieve all of
your data goals.

### Creating your Chalk project

- If you don't already have a GitHub repository where you will store your Chalk code, create one. Then, pick a
local directory in which to work on Chalk code, and clone the GitHub repository there. Then cd inside of the
directory.
- If you haven't already, install Python. You can do this through Homebrew as follows:

```
# Install Homebrew
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Use Homebrew to install python3.11
$ brew install python@3.11
```

- Install the Chalk command line tool. The Chalk CLI allows you to create, update, and
manage your feature pipelines directly from your terminal

```
curl -s -L https://api.chalk.ai/install.sh | sh
```

- Create a Python virtual environment within your repository root directory and activate it

```
$ python3.11 -m venv .venv
$ source .venv/bin/activate
```

You can run source .venv/bin/activate to activate the virtual environment, and deactivate to deactivate.

- Log in by typing chalk login. If you're using a dedicated environment, make sure you use the
--api-host flag. Type y when prompted and log in through the browser.
- Run chalk init to initialize your project files. This will initialize a directory structure with an
empty src directory and three files: chalk.yaml, README.md, and requirements.txt.

```
root_directory/
├── src/
├── chalk.yaml
├── README.md
└── requirements.txt
```

The first file, chalk.yaml, stores configuration information about your project. You can edit this so it
contains the following

```
project: {YOUR_PROJECT_NAME}
environments:
  default:
    requirements: requirements.txt
    runtime: python312
```

The second file, README.md, contains some basic commands you can use with the Chalk CLI.
The third file, requirements.txt should look something like this:

```
requests
chalkpy[runtime]
```

This contains your project dependencies.
You can add a file .chalkignore which will act just like a .gitignore file, and exclude the specified files from
deploys when you run chalk apply (more on this later).

- Within your virtual environment, install the project requirements.

```
$ pip3 install -r requirements.txt
```

Now, you should have a virtual environment with all project dependencies installed and a basic project structure upon
which we will build in the next step.

### Configure data sources

Having set up a basic project directory, next we'll want to configure the data sources from which we will load data to
compute our features. We can do so in the Chalk dashboard.

- In the same directory from before, log in or sign up with Chalk directly from the command line. The
chalk login command will open your browser and create an
API token for your local development, as well as redirect you to your dashboard. If you are not
redirected you can also find the dashboard at https://chalk.ai/projects
or run the chalk dashboard command.
You will automatically see all the environments in which the email you used to log in has been
provisioned as a user.
- Within the dashboard, navigate to Data Sources in the sidebar and add all data sources that you will
be working with here. After you have saved a data source, you can use the Test Data Source button
in the upper right hand corner of the Data Source configuration view to verify that your connection is
valid.
- Within the working directory, we'll add a datasources.py file under our src folder to reference the
data sources that we've added in the dashboard.

```
root_directory/
├── src/
│  ├── __init__.py
│  └── datasources.py
├── chalk.yaml
├── README.md
└── requirements.txt
```

Say we added a PostgreSQL data source, then our datasources.py might look something like this:

```
pg = PostgreSQLSource(name='PG')
```

For more details on setting up data sources, see here.

### Define feature classes and resolvers

Next, we'll define our feature classes and resolvers. Each feature class is a Python class of features, and each resolver
tells Chalk how to compute the values for different features. Each feature that we write should correspond to a
resolver output.

We recommend starting with a minimal feature class and building up iteratively to easily test your code along the way.
After writing some feature classes, resolvers, and tests, we would expect to see a directory structure like this:

```
root_directory/
├── src/
│  ├── resolvers/
│  │  ├── .../
│  │  ├── __init__.py
│  │  └── pipelines.py
│  ├── __init__.py
│  ├── datasources.py
│  └── feature_sets.py
├── tests/
│  └── ...
├── .chalkignore
├── chalk.yaml
├── README.md
└── requirements.txt
```

You can read more in our docs about the different kinds of features and the
different kinds of resolvers that you can write. If you would like
guidance on how to structure your feature classes and resolvers, please reach out in your support channel!

### Deploy and query

Now, you can deploy the features and resolvers that you wrote! You can deploy to production by
using the chalk apply command. During development, we recommend that you use
chalk apply --branch {BRANCH_NAME} to deploy to the branch server, which allows
multiple people to work concurrently in one environment, and also enables more performant deploys.

Once you have deployed your code, then you can query your features directly from the command line using the
chalk query command, or by calling one of our Chalk Clients in code.
Chalk has a Python client, Go client,
Java client, and a Typescript client.

This is the primary workflow for iterating on features and resolvers! Write, deploy, and query to verify whether the
feature values that you receive are the values you expect. Once you have finalized your feature class and resolver
definitions, the final step is frequently orchestration.

### Orchestration

Having verified that your feature and resolver definitions are correct, the next step is to determine how
you want to use the corresponding feature values within your larger machine learning platform. Some users trigger
resolvers and run queries from within other orchestrated pipelines, such as Airflow.
Some users define cron schedulesfor their resolvers,
and set staleness values on different features to
ensure that the data they query falls within their requirements for freshness. The data world is your oyster!

But, as always, if you would like guidance on how to configure that oyster, please reach out in your support channel!

### Further resources

For a detailed tutorial on how to build a fraud model using Chalk, see here.





