Skip to content

Joins and Cardinality

Relational feature engineering fails quickly when cardinality is not controlled. GraphReduce makes cardinality explicit at the edge level.

Edge semantics

Each edge defines:

  • parent_node and relation_node
  • parent_key and relation_key
  • reduce flag

Example:

gr.add_entity_edge(
    parent_node=customer_node,
    relation_node=orders_node,
    parent_key="id",
    relation_key="customer_id",
    reduce=True
)

reduce=True vs reduce=False

  • reduce=True: child rows are aggregated to the parent key before joining. This preserves parent-grain output and avoids row explosion.
  • reduce=False: child rows are joined without prior aggregation. Use this when you intentionally want to preserve child grain.

See Preserve Child Grain for a concrete reduce=False pattern.

Practical guidance

  • Keep your output grain anchored to GraphReduce.parent_node.
  • Put aggregation logic in do_reduce.
  • Use clear node prefixes so joined columns remain traceable.
  • Treat many-to-many edges carefully; decide whether to aggregate once or in stages.