📝 **Question:** Implement `calibration_gap(scores, labels, n_bins=5)` — sort by score, split into equal-size buckets, return per-bucket |mean_score - actual_rate| rounded to 3 decimals. Test on the well-calibrated example in the starter.
📋 Pick the right answer.
💡 **Hint:** Re-read the theory above if unsure.