RelBench Performance Comparison
This table compares performance across RelBench tasks currently listed in these docs.
Best-Per-Task Summary
- GraphReduce + ML: 6 wins
- Kumo (fine tuned): 4 wins
- RDL: 1 win (tie)
- Data Scientist: 0 wins
- Kumo (in context): 0 wins
Notes: - Ties are counted for each tied solution.
| Problem | Data Scientist | RDL | Kumo (in context) | Kumo (fine tuned) | GraphReduce + ML | Metric |
|---|---|---|---|---|---|---|
| rel-stack-user-engagement | 90.3 | 90.2 | 87.09 | 90.7 | 89.41 | AUCROC |
| rel-stack-user-badges | 86.2 | 89.86 | 80 | 89.86 | 84.51 | AUCROC |
| rel-trial-study-outcome | 72.00 | 68.60 | 70.79 | 71.16 | 93.10 | AUCROC |
| rel-amazon-user-churn | 67.6 | 70.42 | 67.29 | 70.46 | 71.8 | AUCROC |
| rel-amazon-item-churn | 81.8 | 82.81 | 79.93 | 82.83 | 81.0 | AUCROC |
| rel-hm-user-churn | 69 | 69.88 | 67.71 | 71.23 | 77 | AUCROC |
| rel-stack-post-votes | 0.068 | 0.065 | 0.065 | 0.065 | 0.061 | MAE |
| rel-hm-item-sales | 0.036 | 0.056 | 0.04 | 0.034 | 0.043 | MAE |
| rel-amazon-user-ltv | 13.92 | 14.31 | 16.16 | 14.22 | 6.57 | MAE |
| rel-amazon-item-ltv | 41.12 | 50.05 | 55.25 | 48.67 | 18.28 | MAE |