Has Many - Chalk

Has-many relationships link a feature to many instances of another feature.

Foreign Keys

The recommended way to specify a join for a has-many relationship is implicitly. In the example below, a User is linked to potentially multiple Transfers.

from chalk.features import features, DataFrame

@features
class Transfer:
    id: str
    # note, the annotation must be a string reference because User is
    # defined after Transfer.
    user_id: "User.id"
    amount: float

@features
class User:
    id: str
    transfers: DataFrame[Transfer]

Explicit Join

The following example, which explicitly sets the join, is equivalent to the above:

from chalk.features import has_many, DataFrame

@features
class Transfer:
    id: str
    user_id: str
    amount: float

@features
class User:
    id: str
    transfers: DataFrame[Transfer] = has_many(lambda: Transfer.user_id == User.id)

Composite Join Keys

You can also specify multiple join keys for a has-many relationship. For example, say hospitals wants to compute aggregations over visit to each of their departments. We could write the following feature classes.

from chalk import _
from chalk.features import features, DataFrame, has_many
import chalk.functions as F
from datetime import datetime


@features
class HospitalVisit:
    id: str = _.user_id + _.hospital + _.department + F.cast(_.date, str)
    composite_key: str = _.hospital + "-" + _.department
    department: str
    hospital: str
    user_id: str
    date: datetime

@features
class HospitalDepartment:
    id: int
    name: str
    hospital_name: str
    composite_key_match: str = _.hospital_name + "-" + _.name
    # multi-feature join
    visits: DataFrame[HospitalVisit] = has_many(
      lambda:
       (HospitalDepartment.hospital_name == HospitalVisit.hospital) &
       (HospitalDepartment.name == HospitalVisit.department)
    )
    # composite key join
    visits_with_composite_key: DataFrame[HospitalVisit] = has_many(
      lambda: HospitalDepartment.composite_key_match == HospitalVisit.composite_key
    )

You can also join in Has-Many relationships for features that have primary composite keys.

from chalk.features import features, DataFrame, has_many

@features
class SoftwareEngineer:
    id: int = _.first_name + " " + _.last_name
    first_name: str
    last_name: str

    manager_id: str

@features
class Manager:
    id: int
    direct_reports: DataFrame[SoftwareEngineer] = has_many(
      lambda: Manager.id == SoftwareEngineer.manager_id
    )

Aggregations on References

Having established a has-many relationship, you can now reference the transfers for a user through the user namespace. The has_many feature returns a chalk.DataFrame, which supports many helpful aggregation operations:

# Number of transfers made by a user
User.transfers.count()

# Total amount of transfers made by the user
User.transfers[Transfer.amount].sum()

# Total amount of the transfers made by the user that were returned
User.transfers[
    Transfer.status == "returned",
    Transfer.amount
].sum()

To compute an aggregation over one or more time windows, see our docs on windowed aggregations.

Back-references

One-to-many

In the reverse direction, a one-to-many relation is defined by a has_one relation (following the above example, a user has many transfers but a transfer has a single user). However, you don’t have to explicitly set the join a second time. Instead, the join condition is assumed to be symmetric and copied over. To complete the one-to-many relationship from our example, add a User to the Transfer class:

@features
class Transfer:
  ...
  user_id: str
  amount: float
  user: "User"

@features
class User:
  ...
  uid: Transfer.user_id
  transfers: DataFrame[Transfer]

Here, you need to use quotes around `User` to use a forward reference.

Many-to-many

The recommended way to define a many-to-many relationship is through a joining feature class. For instance, to define a many-many relationship between Actors and Movies, you could write the following feature classes:

from chalk.features import features, DataFrame

@features
class Actor:
    id: int
    appearances: "DataFrame[MovieRole]"
    full_name: str

    # this will be used to demonstrate one of the
    # ways the joining feature can be populated
    movie_ids: list[int]

@features
class Movie:
    id: int
    title: str

@features
class MovieRole:
    id: str
    actor_id: Actor.id
    movie_id: Movie.id
    movie: Movie

Here you need to use quotes around `DataFrame[MovieRole]` to use a forward reference.

This joining feature class can be populated by a SQL file resolver:

-- resolves: MovieRole
-- source: PG
SELECT id, actor_id, movie_id FROM movie_roles;

Alternatively, by a DataFrame-returning Python resolver (namespaced to one of the joined feature sets):

@online
def get_actor_in_movie(
  a_id: Actor.id,
  movie_ids: Actor.movie_ids,
) -> Actor.appearances:
  return DataFrame([
    MovieRole(
      id=f"{a_id}_{m_id}",
      actor_id=a_id,
      movie_id=m_id
    )
    for m_id in movie_ids
  ])

The joining feature class lets you:

query for movie features from the Actor namespace, and
use movie features in downstream Actor resolvers.

For example, to get the titles for all the movies that an actor has appeared in, you can run the following query:

$ chalk query --in actor.id=1 --out actor.appearances.movie.title
Results

 Name                           Hit?  Value
───────────────────────────────────────────────────────────────────────────────
 actor.appearances.movie.title        ["The Bad Sleep Well","High and Low",...]

​Foreign Keys

​Explicit Join

​Composite Join Keys

​Aggregations on References

​Back-references

​One-to-many

​Many-to-many