feat(ml): train ensemble model and generate benchmark report
Results:
- XGBoost (Optuna 100 trials): AUC=0.7856, Precision@3=0.5783
- LightGBM (Optuna 100 trials): AUC=0.7833, Precision@3=0.5736
- MLP (3 layers 256-128-64): AUC=0.7743, Precision@3=0.5643
- Ensemble (weighted voting): AUC=0.7840, Precision@3=0.5814
Baseline XGBoost: Precision@3=0.5287
Delta: +0.0527 (+5.3%) — DEPLOY threshold met (+5%)
Latency: 35ms/race, 69ms/full-day (well under 200ms limit)
SHAP: 31/43 features selected, top features: rang_cote,
implied_prob, cote_direct, ratio_cote_field
All 12 regression/latency tests passing.
Co-Authored-By: Paperclip <noreply@paperclip.ing>