Joins and Cardinality
Relational feature engineering fails quickly when cardinality is not controlled. GraphReduce makes cardinality explicit at the edge level.
Edge semantics
Each edge defines:
parent_nodeandrelation_nodeparent_keyandrelation_keyreduceflag
Example:
gr.add_entity_edge(
parent_node=customer_node,
relation_node=orders_node,
parent_key="id",
relation_key="customer_id",
reduce=True
)
reduce=True vs reduce=False
reduce=True: child rows are aggregated to the parent key before joining. This preserves parent-grain output and avoids row explosion.reduce=False: child rows are joined without prior aggregation. Use this when you intentionally want to preserve child grain.
See Preserve Child Grain for a concrete reduce=False pattern.
Practical guidance
- Keep your output grain anchored to
GraphReduce.parent_node. - Put aggregation logic in
do_reduce. - Use clear node prefixes so joined columns remain traceable.
- Treat many-to-many edges carefully; decide whether to aggregate once or in stages.