Time Windows and Leakage Safety
GraphReduce is designed for point-in-time-correct feature generation.
Core timing controls
Graph and node parameters define what historical data is visible at transform time:
cut_date: reference time for feature/label splitting.compute_period_val+compute_period_unit: lookback window for feature computation.label_period_val+label_period_unit: forward window for label generation.date_keyon nodes: required for time-based filtering on that node.
Base-class definitions (window boundaries)
In GraphReduceNode, the base methods enforce these windows:
prep_for_features:date_key < cut_dateanddate_key > cut_date - compute_period_minutes()prep_for_labels:date_key > cut_dateanddate_key < cut_date + label_period_minutes()
Why this matters
Without strict time windows, training features can include future information and inflate offline metrics. GraphReduce pushes shared time configuration through the graph so reductions and joins remain leakage-safe.
Practical checklist
- Set a valid
date_keyon all time-varying nodes. - Confirm feature windows end at or before
cut_date. - Compute labels only in the label horizon after
cut_date. - Validate output grain and time boundaries in a small sample before full runs.
See Tutorial: temporal setup for a step-by-step example.