Skip to content

Operations Pipeline

GraphReduce is built around a predictable per-node operation sequence. This is where operations like filters, joins, and group-bys are defined.

Node-level operations

For GraphReduceNode subclasses, the main methods are:

  • do_filters: row filtering on a single table.
  • do_annotate: feature columns derived from a single table.
  • do_normalize: normalization/cleanup logic.
  • do_reduce: aggregation to a parent key (for example groupby(...).agg(...)).
  • do_labels: label aggregation logic.
  • do_post_join_annotate: annotations that depend on joined child data.
  • do_post_join_filters: filters that require multi-table context.

Graph-level flow

At runtime, GraphReduce traverses the relationship graph depth-first and performs:

  1. Child-table preprocessing (do_filters, do_annotate, do_normalize).
  2. Child-table reduction (do_reduce) when the edge has reduce=True.
  3. Join of reduced child features into parent.
  4. Parent post-join operations (do_post_join_annotate, do_post_join_filters) when required.
  5. Optional label computation (do_labels) over the configured label period.

Where SQL fits

For SQL backends, SQLNode expresses equivalent operations as composable SQL operations (including GROUP BY aggregation) instead of dataframe-native code.

See: