The UEFA EURO 2020 prediction winner is ...

The UEFA EURO 2020 tournament is over and we are ready to announce the winner of the tournament prediction competition. On June 2nd, football enthusiasts and data analysts were encouraged to provide predictions of the tournament results. And we already know who did the best! (In fact we’ve known for some time since the outcome of the last couple of matches did not change who did best). And the prediction winner is

… going to be revealed just below. This post from June 2nd shows the original announcement.

Each contestant was asked to submit a prediction that should be a 6 x 24 matrix where the columns represent the countries, and the rows represent the possible ranks obtained after the tournament is over. The entries in the matrix should be numbers between 0 and 1 and represent probabilities that a given country will end up at a given rank. Consequently, the columns will sum to 1 and the rows will sum to the number of teams that will end up for each rank.

The participants

Ten participants provided submissions and they did not have to explain how they arrived at their predictions. The predictions could be based on gut feeling, reading tea leaves or complex statistical models.

Figur 1: Initial prediction of the team that will win the UEFA EURO 2020 tournament. One contestant was absolutely certain that Belgium would win.

And the winner is …

The prediction winner will be the participant who provided a prediction that will return the lowest Tournament Rank Prediction Score as proposed in Evaluating one-shot tournament predictions by Ekstrøm, Van Eetvelde, Ley and Brefeld. The socceR package on CRAN will be used for the computation and smaller numbers indicate better predictions.

I’ve assigned some names to the entries based on the GitHub uploads. I do apologize if the names are not adequately covering the underlying prediction approach.

# The object pred_list contains a list of matrix predictions
# Tournament outcome. Ranks
outcome <- c(6, 1, 5, 4,
             3, 6, 4, 6,
             5, 4, 5, 6,
             2, 5, 6, 4,
             3, 5, 6, 6,
             6, 5, 5, 5)

data.frame(team=colnames(pred_list[[1]]), rank=outcome)
              team rank
1           Turkey    6
2            Italy    1
3            Wales    5
4      Switzerland    4
5          Denmark    3
6          Finland    6
7          Belgium    4
8           Russia    6
9      Netherlands    5
10         Ukraine    4
11         Austria    5
12 North Macedonia    6
13         England    2
14         Croatia    5
15        Scotland    6
16  Czech Republic    4
17           Spain    3
18          Sweden    5
19          Poland    6
20        Slovakia    6
21         Hungary    6
22        Portugal    5
23          France    5
24         Germany    5
result <- sapply(pred_list, function(i) { socceR::trps(i, outcome)})
Tabel 1: Tournament rank probaility score for the 10 entries in the tournament prediction competition. Smaller numbers indicate better predictions.
TRPS
Brandt (ELO) 0.0942
Current strength 0.1011
Mads 0.1025
Random forest 0.1045
Podlewski 0.1054
Bookmaker consensus 0.1066
XGBoost 0.1089
FK 0.1319
CD 0.1478
Simple 0.1667

For comparison, a completely flat prediction where each team was assigned the same probability of each rank would yield a TRPS of 0.1399. Note that this completely uninformative prediction performs better than two of the entries: Simple and CD. Especially the Simple entry performed spectacularly bad and had a tournament rank predictions score that was markedly larger than a completely flat prediction. Confident prediction that are wrong are being penalized heavily!

Congratulations to Lennart Brandt who provided a prediction based on the teams’ ELO rating and had the best overall tournament prediction. And thanks to all the participants. Hope to see you all for the World Cup Prediction next year!