RelBench Performance Comparison

This table compares performance across RelBench tasks currently listed in these docs.

Best-Per-Task Summary

Notes: - Ties are counted for each tied solution.

Problem	Data Scientist	RDL	Kumo (in context)	Kumo (fine tuned)	GraphReduce + ML	Metric
rel-stack-user-engagement	90.3	90.2	87.09	90.7	89.41	AUCROC
rel-stack-user-badges	86.2	89.86	80	89.86	84.51	AUCROC
rel-trial-study-outcome	72.00	68.60	70.79	71.16	93.10	AUCROC
rel-amazon-user-churn	67.6	70.42	67.29	70.46	71.8	AUCROC
rel-amazon-item-churn	81.8	82.81	79.93	82.83	81.0	AUCROC
rel-hm-user-churn	69	69.88	67.71	71.23	77	AUCROC
rel-stack-post-votes	0.068	0.065	0.065	0.065	0.061	MAE
rel-hm-item-sales	0.036	0.056	0.04	0.034	0.043	MAE
rel-amazon-user-ltv	13.92	14.31	16.16	14.22	6.57	MAE
rel-amazon-item-ltv	41.12	50.05	55.25	48.67	18.28	MAE