Chalk maintains several client libraries (gRPC) and a REST API for fetching feature values.


Library support

Chalk maintains libraries in several major languages for fetching online feature values. If you need support for a language that we don’t support, let us know! We also support a rest API if you’d like to build your own.

> chalk query \
      --in user.id=1 \
      --out user.identity.is_voip_phone \
      --out user.fraud_score \
      --staleness user.account_balance=10m \
      --environment staging \
      --tag live


REST API

Chalk supports a REST API for querying online features and exposes this endpoint in several API clients. When you execute an online query, resolvers will execute to produce the requested data. Online query will prioritize running online resolvers over offline resolvers to compute features if both are possible.

The following endpoint can also be hit with the python ChalkClient by using its query method. For information on how to authenticate the ChalkClient, check out the section on authentication. Read more about the parameters to this method here.


Request

POST
https://api.chalk.ai/v1/query/online
Attributes
inputsmap[string, JSON]
Input features and values are provided at the time of request. For example, primary key-value pairs often designate the subset of data returned. Feature inputs are provided by fully qualified path. Has many featuresare input as lists, and struct features are input as JSON.

An example of passing a user with two credit cards as input:

{user.id: '1', user.cards: [Card(id='xyz'), Card(id='abc')]}
outputsstring[]
Outputs are the features that you'd like to compute from the inputs.
stalenessmap[string, duration]
Maximum staleness overrides for any output features or intermediate features. See query caching for more information.
contextQueryContext?
The context object controls the environment and tags under which a request should execute resolvers:
QueryContext
environmentstring?
The environment in which to run the resolvers. Like resolvers, API tokens can be scoped to an environment. If no environment is specified in the query, but the token supports only a single environment, then that environment will be taken as the scope for executing the request.
tagsstring[]?
The tags used to scope the resolvers.
preview_deployment_idstring?
The preview deployment id. See Preview Deployments for more information.
query_namestring?
The query name. See NamedQuery for more details.
branchstring?
If specified, routes to the relevant branch. See Branches for more information.

More information on parameters is available here


Response

Attributes
dataFeatureResult[]
The outputs features and any query metadata
FeatureResult
fieldstring
The name of the feature requested, eg. user.identity.has_voip_phone.
valuetypeof(field)?
The value of the requested feature. If an error was encountered in resolving this feature, this field will be empty.
errorChalkError?
The error code encountered in resolving this feature. If no error occurred, this field is empty.
metaFeatureResolutionMeta?
Metadata pertaining to the feature, including the resolver run and whether the result was a cache hit.
errorsChalkError[]?
Errors encountered while running the resolvers. Each element in the list is a ChalkError. If no errors were encountered, this field is empty.
metaQueryMeta?
Metadata related to the query. Returned if include_meta or explain is set to True.
QueryMeta
execution_duration_sfloat
The time, expressed in seconds, that Chalk spent executing this query.
deployment_idstring?
The id of the deployment that served this query.
environment_idstring?
The id of the environment that served this query.
environment_namestring?
The short name of the environment that served this query. For example: "dev" or "prod".
query_idstring?
A unique ID generated and persisted by Chalk for this query. All computed features, metrics, and logs are associated with this ID. Your system can store this ID for audit and debugging workflows.
query_timestampdatetime?
At the start of query execution, Chalk computes 'datetime.now()'. This value is used to timestamp computed features.
query_hashstring?
Deterministic hash of the 'structure' of the query. Queries that have the same input/output features will typically have the same hash; changes may be observed over time as we adjust implementation details.
explain_outputstring?
An unstructured string containing diagnostic information about the query execution. Only included ifexplain is set toTrue.

Query Explanation

Chalk offers support for the user for when queries don’t work. The first step is always to check to see the response contains any errors. Often, the error message will directly point to the failure.

In the case of more complicated queries, queries can be sent with explain=True. This will return a representation of the query plan in the meta return attribute. The user can use this information to verify the resolvers and operators ran during execution. Beware, this will result in slower execution times.

Some queries that involve multiple operations might need additional tracking. Users can supply store_plan_stages=True to store intermediate outputs at all operations of the query. This will dramatically slow things down, so use wisely! These results are visible in the dashboard under the “Queries” page.

For more information, read the ChalkClient docs here.


Online Query Bulk

Compute feature values for many rows of inputs using online resolvers. This endpoint is similar to the online query endpoint, but takes in lists of inputs and produces one output per row of inputs. This is appropriate when you want to fetch the same set of features for many different input primary keys.

The following endpoint can be accessed with the Python ChalkClient by using its query_bulk method.

Request

POST
https://api.chalk.ai/v1/query/feather

The request body should be a binary payload containing:

  1. A magic string identifier
  2. Serialized query metadata
  3. Feature data in Apache Feather format

When using the ChalkClient, this serialization is handled automatically. For direct HTTP usage, the structure is:

  • Request inputs are provided as mappings of feature names to lists of values
  • Each list should have the same length, representing multiple rows of data
Attributes
inputsmap[string, JSON[]]
Input features and **lists** of values. Each key is a feature name (e.g., `"user.id"`) and each value is a list of values for that feature. All lists must have the same length, where each element represents one row.Has-many features are input as lists within each row element, and struct features (has-one) are input as JSON objects within each row element.

Example with simple, has-one, and has-many features:

{
  "user.id": [1, 2],
  "user.name": ["Alice", "Bob"],
  "user.profile": [
    {"age": 30, "city": "NYC"},
    {"age": 25, "city": "SF"}
  ],
  "user.cards": [
    [{"id": "card1"}, {"id": "card2"}],
    [{"id": "card3"}]
  ]
}
outputsstring[]
The features that you'd like to compute from the inputs. Same as online query.
nowstring[]?
List of timestamps (ISO format) for each row. If provided, the list must match the length of the input value lists. Each timestamp represents the query time for the corresponding row.
stalenessmap[string, duration]
Maximum staleness overrides for any output features or intermediate features. See query caching for more information.
contextQueryContext?
The context object controls the environment and tags under which requests execute:
QueryContext
environmentstring?
The environment in which to run the resolvers.
tagsstring[]?
The tags used to scope the resolvers.
required_resolver_tagsstring[]?
If specified, all required_resolver_tags must be present on a resolver for it to be eligible to execute.
query_namestring?
The semantic name for the query, e.g., `"loan_application_model"`. See NamedQuery.
query_name_versionstring?
The version of the named query to execute.
correlation_idstring?
A globally unique ID for the query, used in logs and web interfaces.
branch_idstring?
If specified, routes to the relevant branch.
preview_deployment_idstring?
If specified, routes to the relevant preview deployment.
explainbool?
If true, returns query execution plan in the response metadata. Makes the query slower.
store_plan_stagesbool?
If true, stores intermediate outputs at all query plan stages. Dramatically impacts performance.
metamap[string, string]?
Arbitrary key-value pairs to associate with the query.
query_contextmap[string, JSON] | string?
An immutable context accessible from Python resolvers. See ChalkContext.

Response

The response is a binary payload in Apache Feather format containing the results.

Attributes
resultsBulkOnlineQueryResult[]
A list of query results, where each result contains:
BulkOnlineQueryResult
scalars_dfDataFrame?
A DataFrame containing the scalar feature values for all rows. Each row corresponds to an input row. Columns are feature names, and values are the computed feature values.
groups_dfsmap[string, DataFrame]?
A map of feature names to DataFrames for has-many features.
errorsChalkError[]?
Errors encountered while running the resolvers.
metaQueryMeta?
Metadata about query execution including execution duration, deployment ID, query ID, etc. See the online query response documentation for QueryMeta details.

Offline Query

Submit an offline query to compute feature values from the offline store or by running offline/online resolvers. Offline queries are typically used for generating training datasets and run asynchronously.

The following endpoint can be accessed with the Python ChalkClient by using its offline_query method. See the offline query documentation for more information.

Request

POST
https://api.chalk.ai/v1/query/run

The request body is a binary payload containing:

  1. Serialized query plan (protobuf)
  2. Query metadata (JSON)
  3. Input data (Apache Feather format)

When using the ChalkClient, this serialization is handled automatically.

Attributes
inputmap[string, JSON[]] | DataFrame | URI
The features for which there are known values. Can be: - A mapping of feature names to lists of values (similar to bulk query format) - A DataFrame with input data - A URI pointing to input data in cloud storage

When using a mapping, has-many features are input as lists within each row, and struct features (has-one) are input as JSON objects within each row. See the bulk query input format above for examples.

input_timesdatetime[] | datetime?
Timestamps for point-in-time correctness. If a list, must match the length of input rows. See temporal consistency.
outputstring[]
The features to compute or sample. If a feature was never computed for a sample, its value will be null.
required_outputstring[]?
Features that must exist in each row. Rows where a required output was never stored will be skipped.
recompute_featuresbool | string[]?
Controls whether resolvers run to compute features: - If `true`, all output features are recomputed by resolvers - If `false`, all output features are sampled from the offline store - If a list, features in the list are recomputed, others are sampled
environmentstring?
The environment in which to run resolvers.
dataset_namestring?
A unique name for the dataset. If provided, the dataset will be saved and can be retrieved later.
max_samplesint?
Maximum number of samples to include in the result. If not specified, all samples are returned.
lower_bounddatetime | duration | string?
Only query data observed after this timestamp. Accepts ISO 8601 format strings.
upper_bounddatetime | duration | string?
Only query data observed before this timestamp. Accepts ISO 8601 format strings.
tagsstring[]?
The tags used to scope the resolvers.
branch_idstring?
If specified, routes to the relevant branch.
correlation_idstring?
A globally unique ID for the query, used in logs and web interfaces.
query_namestring?
The name of the query. If provided, creates a named query or fills in missing parameters from a preexisting execution.
run_asynchronouslybool?
If true, runs the query in separate Kubernetes pods. Useful for large datasets and long-running jobs.
store_onlinebool?
If true, stores the query output in the online store.
store_offlinebool?
If true, stores the query output in the offline store.
num_shardsint?
If specified, splits the input across this many shards for parallel processing.
resourcesResourceRequests?
Override resource requests (CPU, memory) for the offline query job.

Response

The response contains information about the submitted offline query job.

Attributes
job_idstring
A unique identifier for the offline query job. Use this to check the job status.
dataset_idstring?
If a dataset_name was provided, this is the ID of the created dataset.

Check Offline Query Status

Check the status of an offline query job. Offline queries run asynchronously, so you need to poll this endpoint to determine when the job is complete and the results are ready.

Request

POST
https://api.chalk.ai/v4/offline_query/status
Attributes
job_idstring?
The job ID returned from the offline query request (also called revision_id).
dataset_namestring?
The name of the dataset, if one was provided in the offline query request.
dataset_idstring?
The ID of the dataset.
ignore_errorsbool?
If true, returns results even if some errors occurred during execution. Default is false.
skip_failed_shardsbool?
If true, skips failed shards and returns results from successful shards only. Default is false.

You must provide at least one of: job_id, dataset_name, or dataset_id.

Response

Attributes
is_finishedbool
Whether the offline query job has completed. Poll this endpoint until this field is true.
urlsstring[]
A list of short-lived, authenticated URLs to download the query results. These URLs point to data files in cloud storage (S3 or GCS) containing the feature values in a columnar format. Only populated when `is_finished` is true.
errorsChalkError[]?
Errors encountered during query execution, if any.
versionint
Version number representing the format of the data. The client uses this to properly decode and load the query results into DataFrames. Current version is 1.

Once is_finished is true, you can download the data from the provided URLs and load it into a DataFrame. The ChalkClient’s Dataset object handles this automatically.