Chalk maintains libraries in several major languages
for fetching online feature values. If you need support
for a language that we don’t support, let us know!
We also support a rest API if you’d like to build your
own.
Chalk supports a REST API for querying online features
and exposes this endpoint in several API clients.
When you execute an online query, resolvers
will execute to produce the requested data.
Online query will prioritize running online resolvers over offline resolvers to compute features
if both are possible.
The following endpoint can also be hit with the python ChalkClient by using its query method.
For information on how to authenticate the ChalkClient, check out the section on
authentication.
Read more about the parameters to this method here.
Input features and values are provided at the time of request. For example, primary key-value pairs often designate the subset of data returned. Feature inputs are provided by fully qualified path. Has many featuresare input as lists, and struct features are input as JSON.
An example of passing a user with two credit cards as input:
Maximum staleness overrides for any output features or intermediate features. See query caching for more information.
contextQueryContext?
The context object controls the environment and tags under which a request should execute resolvers:
QueryContext
environmentstring?
The environment in which to run the resolvers. Like resolvers, API tokens can be scoped to an environment. If no environment is specified in the query, but the token supports only a single environment, then that environment will be taken as the scope for executing the request.
Errors encountered while running the resolvers. Each element in the list is a ChalkError. If no errors were encountered, this field is empty.
metaQueryMeta?
Metadata related to the query. Returned if include_meta or explain is set to True.
QueryMeta
execution_duration_sfloat
The time, expressed in seconds, that Chalk spent executing this query.
deployment_idstring?
The id of the deployment that served this query.
environment_idstring?
The id of the environment that served this query.
environment_namestring?
The short name of the environment that served this query. For example: "dev" or "prod".
query_idstring?
A unique ID generated and persisted by Chalk for this query. All computed features, metrics, and logs are associated with this ID. Your system can store this ID for audit and debugging workflows.
query_timestampdatetime?
At the start of query execution, Chalk computes 'datetime.now()'. This value is used to timestamp computed features.
query_hashstring?
Deterministic hash of the 'structure' of the query. Queries that have the same input/output features will typically have the same hash; changes may be observed over time as we adjust implementation details.
explain_outputstring?
An unstructured string containing diagnostic information about the query execution. Only included ifexplain is set toTrue.
Chalk offers support for the user for when queries don’t work.
The first step is always to check to see the response contains any errors.
Often, the error message will directly point to the failure.
In the case of more complicated queries, queries can be sent with explain=True.
This will return a representation of the query plan in the meta return attribute.
The user can use this information to verify the resolvers and operators ran during execution.
Beware, this will result in slower execution times.
Some queries that involve multiple operations might need additional tracking.
Users can supply store_plan_stages=True to store intermediate outputs at all operations of the query.
This will dramatically slow things down, so use wisely!
These results are visible in the dashboard under the “Queries” page.
For more information, read the ChalkClient docs here.
Compute feature values for many rows of inputs using online resolvers. This endpoint is similar to the
online query endpoint, but takes in lists of inputs and produces one output per row of inputs.
This is appropriate when you want to fetch the same set of features for many different input primary keys.
The following endpoint can be accessed with the Python ChalkClient by using its query_bulk method.
The request body should be a binary payload containing:
A magic string identifier
Serialized query metadata
Feature data in Apache Feather format
When using the ChalkClient, this serialization is handled automatically. For direct HTTP usage, the structure is:
Request inputs are provided as mappings of feature names to lists of values
Each list should have the same length, representing multiple rows of data
Attributes
inputsmap[string, JSON[]]
Input features and **lists** of values. Each key is a feature name (e.g., `"user.id"`) and each value is a list of values for that feature. All lists must have the same length, where each element represents one row.Has-many features are input as lists within each row element, and struct features (has-one) are input as JSON objects within each row element.
Example with simple, has-one, and has-many features:
The features that you'd like to compute from the inputs. Same as online query.
nowstring[]?
List of timestamps (ISO format) for each row. If provided, the list must match the length of the input value lists. Each timestamp represents the query time for the corresponding row.
The response is a binary payload in Apache Feather format containing the results.
Attributes
resultsBulkOnlineQueryResult[]
A list of query results, where each result contains:
BulkOnlineQueryResult
scalars_dfDataFrame?
A DataFrame containing the scalar feature values for all rows. Each row corresponds to an input row. Columns are feature names, and values are the computed feature values.
groups_dfsmap[string, DataFrame]?
A map of feature names to DataFrames for has-many features.
Metadata about query execution including execution duration, deployment ID, query ID, etc. See the online query response documentation for QueryMeta details.
Submit an offline query to compute feature values from the offline store or by running offline/online resolvers.
Offline queries are typically used for generating training datasets and run asynchronously.
The following endpoint can be accessed with the Python ChalkClient by using its offline_query method.
See the offline query documentation for more information.
When using the ChalkClient, this serialization is handled automatically.
Attributes
inputmap[string, JSON[]] | DataFrame | URI
The features for which there are known values. Can be: - A mapping of feature names to lists of values (similar to bulk query format) - A DataFrame with input data - A URI pointing to input data in cloud storage
When using a mapping, has-many features are input as lists within each row, and struct features (has-one) are input as JSON objects within each row.
See the bulk query input format above for examples.
input_timesdatetime[] | datetime?
Timestamps for point-in-time correctness. If a list, must match the length of input rows. See temporal consistency.
outputstring[]
The features to compute or sample. If a feature was never computed for a sample, its value will be null.
required_outputstring[]?
Features that must exist in each row. Rows where a required output was never stored will be skipped.
recompute_featuresbool | string[]?
Controls whether resolvers run to compute features: - If `true`, all output features are recomputed by resolvers - If `false`, all output features are sampled from the offline store - If a list, features in the list are recomputed, others are sampled
Check the status of an offline query job. Offline queries run asynchronously, so you need to poll
this endpoint to determine when the job is complete and the results are ready.
Whether the offline query job has completed. Poll this endpoint until this field is true.
urlsstring[]
A list of short-lived, authenticated URLs to download the query results. These URLs point to data files in cloud storage (S3 or GCS) containing the feature values in a columnar format. Only populated when `is_finished` is true.
Errors encountered during query execution, if any.
versionint
Version number representing the format of the data. The client uses this to properly decode and load the query results into DataFrames. Current version is 1.
Once is_finished is true, you can download the data from the provided URLs and load it into a DataFrame.
The ChalkClient’s Dataset object handles this automatically.