Operations Pipeline
GraphReduce is built around a predictable per-node operation sequence. This is where operations like filters, joins, and group-bys are defined.
Node-level operations
For GraphReduceNode subclasses, the main methods are:
do_filters: row filtering on a single table.do_annotate: feature columns derived from a single table.do_normalize: normalization/cleanup logic.do_reduce: aggregation to a parent key (for examplegroupby(...).agg(...)).do_labels: label aggregation logic.do_post_join_annotate: annotations that depend on joined child data.do_post_join_filters: filters that require multi-table context.
Graph-level flow
At runtime, GraphReduce traverses the relationship graph depth-first and performs:
- Child-table preprocessing (
do_filters,do_annotate,do_normalize). - Child-table reduction (
do_reduce) when the edge hasreduce=True. - Join of reduced child features into parent.
- Parent post-join operations (
do_post_join_annotate,do_post_join_filters) when required. - Optional label computation (
do_labels) over the configured label period.
Where SQL fits
For SQL backends, SQLNode expresses equivalent operations as composable SQL operations (including GROUP BY aggregation) instead of dataframe-native code.
See: