Operations Pipeline

GraphReduce is built around a predictable per-node operation sequence. This is where operations like filters, joins, and group-bys are defined.

Node-level operations

For GraphReduceNode subclasses, the main methods are:

do_filters: row filtering on a single table.
do_annotate: feature columns derived from a single table.
do_normalize: normalization/cleanup logic.
do_reduce: aggregation to a parent key (for example groupby(...).agg(...)).
do_labels: label aggregation logic.
do_post_join_annotate: annotations that depend on joined child data.
do_post_join_filters: filters that require multi-table context.

Graph-level flow

At runtime, GraphReduce traverses the relationship graph depth-first and performs:

Child-table preprocessing (do_filters, do_annotate, do_normalize).
Child-table reduction (do_reduce) when the edge has reduce=True.
Join of reduced child features into parent.
Parent post-join operations (do_post_join_annotate, do_post_join_filters) when required.
Optional label computation (do_labels) over the configured label period.

Where SQL fits

For SQL backends, SQLNode expresses equivalent operations as composable SQL operations (including GROUP BY aggregation) instead of dataframe-native code.

See: