Chalk home page
Docs
API
CLI
  1. Features
  2. Has Many

Has-many relationships link a feature to many instances of another feature.

The recommended way to specify a join for a has-many relationship is implicitly. In the example below, a User is linked to potentially multiple Transfers.

from chalk.features import features, DataFrame, ...

@features
class Transfer:
    id: str
    # note, the annotation must be a string reference because User is
    # defined after Transfer.
    user_id: "User.id"
    amount: float

@features
class User:
    id: str
    transfers: DataFrame[Transfer]

Explicit Join

The following example, which explicitly sets the join, is equivalent to the above:

from chalk.features import has_many, DataFrame

@features
class Transfer:
    id: str
    user_id: str
    amount: float

@features
class User:
    id: str
    transfers: DataFrame[Transfer] = has_many(lambda: Transfer.user_id == User.id)

Aggregations on References

Having established a has-many relationship, you can now reference the transfers for a user through the user namespace. The has_many feature returns a chalk.DataFrame, which supports many helpful aggregation operations:

# Number of transfers made by a user
User.transfers.count()

# Total amount of transfers made by the user
User.transfers[Transfer.amount].sum()

# Total amount of the transfers made by the user that were returned
User.transfers[
    Transfer.status == "returned",
    Transfer.amount
].sum()

Back-references

One-to-many

In the reverse direction, a one-to-many relation is defined by a has_one relation (following the above example, a user has many transfers but a transfer has a single user). However, you don’t have to explicitly set the join a second time. Instead, the join condition is assumed to be symmetric and copied over. To complete the one-to-many relationship from our example, add a User to the Transfer class:

@features
class Transfer:
  ...
  user_id: str
  amount: float
  user: "User"

@features
class User:
  ...
  uid: Transfer.user_id
  transfers: DataFrame[Transfer]

Here, you need to use quotes around `User` to use a forward reference.

Many-to-many

The recommended way to define a many-to-many relationship is through a joining feature class. For instance, to define a many-many relationship between Actors and Movies, you could write the following feature classes:

from chalk.features import features, DataFrame

@features
class Actor:
  id: int
  appearances: "DataFrame[MovieRole]"
  full_name: str

  # this will be used to demonstrate one of the ways the joining feature can be populated
  movie_ids: list[int]

@features
class Movie:
  id: int
  title: str

@features
class MovieRole:
  id: str
  actor_id: Actor.id
  movie_id: Movie.id
  movie: Movie

Here you need to use quotes around `DataFrame[MovieRole]` to use a forward reference.

This joining feature class can be populated by a SQL file resolver:

-- resolves: MovieRole
-- source: PG
SELECT id, actor_id, movie_id FROM movie_roles;

Alternatively, by a DataFrame-returning Python resolver (namespaced to one of the joined feature sets):

@online
def get_actor_in_movie(a_id: Actor.id, movie_ids: Actor.movie_ids) -> Actor.appearances:
  return DataFrame([
    MovieRole(id=f"{a_id}_{m_id}", actor_id=a_id, movie_id=m_id)
    for m_id in movie_ids
  ])

The joining feature class lets you:

  • query for movie features from the Actor namespace, and
  • use movie features in downstream Actor resolvers.

For example, to get the titles for all the movies that an actor has appeared in, you can run the following query:

$ chalk query --in actor.id=1 --out actor.appearances.movie.title
Results

 Name                           Hit?  Value
───────────────────────────────────────────────────────────────────────────────
 actor.appearances.movie.title        ["The Bad Sleep Well","High and Low",...]