World Cup prediction winners
Predicting the outcome of the different teams in the FIFA World Cup has been of great interest to the general public, and predicting the outcome has also attracted quite some attention in the R community. The World Cup has ended and by now, everyone knows that France managed to take home the trophy that slipped through their fingers when they hosted the UEFA Euro 2016 championship. But who won the more important competition of predicting the outcome?
At the European R Users Meeting I gave a presentation about predicting the results of the 2018 FIFA World Cup. My R code was made available and I encouraged others to use it, make improvements and make their own predictions. The code was picked up and used by several other R users and resulted in several interesting posts such as this blog-post from Mango Solutions or this piece in the Australian The Conversation. It’s always interesting to see how different people address the same problem.
Only one other group took up the competition and provided the full prediction code for the World Cup. Thus, the competitors are the prediction by me (based on odds and the Skellam distribution) and a prediction by Groll, Ley, Schauberger, Van Eetvelde. I’ve also added a prediction based on ELO ratings instead of odds/Skellam for a total of 3 predictions.
Evaluating the prediction winner
When comparing different predictions it is not enough to evaluate only how well the winner of the World Cup was predicted. The predictions of the positions of the other teams should count as well. For that purpose we employ the log-loss scoring rule, which is a proper scoring rule, and it has the nice property that it puts extra penalty to a prediction if the prediction is wrong and you are certain about it.
“It’s better to be somewhat wrong than emphatically wrong. Of course it’s always better to be right”
-Andrew Collier
Log-loss is defined as follows for a country \(l\) that achieves rank \(r\):
\[-\sum_{\text{rank}\; r} \sum_{\text{country}\; l} I(\text{country $l$ achieved rank $r$})\log(\hat{p}_{rl}),\]
where \(\hat{p}_{rl}\) is the predicted probability that country \(l\) achieves rank \(r\), and \(I()\) is an indicator function.
The log-loss scoring rule ensures that it is not only important to predict which country will achieve a given rank - it is also important that you guess the ranks with as high probability as possible. Small values of log-loss (close to zero) means that the prediction is good while larger values correspond to worse predictions.
However, the World Cup only produces partial rankings since there is only an official rank given to the first 4 teams (gold, silver, bronze, and 4. position), while the remaining 28 teams only get a partial ranking. We know which 4 teams that reached the quarter final but did not continue (positions 5-8), who came to the 8th final (position 9-16) et cetera. The solution to this is to collapse the probabilities into 7 categories: 1st place, 2nd place, 3rd, 4th, 5th-8th, 9th-16th, and 17th-32nd, and compute the log-loss based on the probabilities of ending in one of these 7 groups.
The log-loss therefore becomes
\[-\sum_{\text{category}\; k} \sum_{\text{country}\; l} I(\text{country $l$ ended up in category $k$}) \log(\hat{p}_{kl})\]
Another problem with the default log-loss is that all predictions are weighed equally. That makes sense under the prior assumption that all categories are equally important but let’s face it: for a World Cup prediction it really is more important to predict who will come out on top. The log-loss function can be modified by adding a weight, \(w_k\), so we put more emphasis to correct predictions at the top of the tournament.
\[-\sum_{\text{category}\; k} \sum_{\text{country}\; l} w_k \; I(\text{country $l$ ended up in category $k$}) \log(\hat{p}_{kl})\]
One possibility for the weights would be to use \(1, 1, \frac12, \frac12, \frac14, \frac18\) and \(\frac{1}{16}\) for category 1, 2, 3, 4, 5-8, 9-16 and 17-32, respectively. This makes it more important to predict the top of the tournament correctly, while there still is some benefits to correctly predicting the lower ranking countries as well. The weights are chosen such that each category has the same weight (with 3rd and 4th place regarded as one common category).
Getting the predictions
Each prediction is a \(32\times32\) matrix where each columns represents a country and the rows represent the ranks of the teams. The elements of the matrix are the probabilities that a team will obtain the given rank. The predictions are available online as an rda
file that contains three matrices, ekstrom1
, ekstrom2
, and GLSE
so it is not necessary to run all the simulations again to get started:
# Load the three predictions
load(url("http://sandsynligvis.dk/files/fifa2018results.rda"))
head(GLSE)
Egypt Russia Saudi Arabia Uruguay Iran Morocco Portugal Spain
[1,] 0.0009 0.0005 0.0000 0.0117 0.0002 0.0038 0.0244 0.1776
[2,] 0.0020 0.0030 0.0003 0.0312 0.0013 0.0070 0.0481 0.1069
[3,] 0.0036 0.0044 0.0002 0.0342 0.0021 0.0082 0.0507 0.1264
[4,] 0.0118 0.0154 0.0033 0.0590 0.0059 0.0233 0.0740 0.0622
[5,] 0.0309 0.0328 0.0069 0.1951 0.0286 0.0777 0.1535 0.0763
[6,] 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Australia Denmark France Peru Argentina Croatia Iceland Nigeria Brazil
[1,] 0.0006 0.0167 0.1130 0.0052 0.0715 0.0209 0.0056 0.0008 0.1218
[2,] 0.0018 0.0289 0.0938 0.0160 0.0823 0.0392 0.0129 0.0026 0.0947
[3,] 0.0024 0.0325 0.1040 0.0145 0.0827 0.0405 0.0127 0.0022 0.0764
[4,] 0.0079 0.0431 0.0622 0.0279 0.0627 0.0583 0.0239 0.0071 0.0447
[5,] 0.0094 0.0550 0.1257 0.0303 0.0488 0.0671 0.0435 0.0184 0.0000
[6,] 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1335
Costa Rica Switzerland Serbia Germany South Korea Mexico Sweden Belgium
[1,] 0.0005 0.0207 0.0046 0.1746 0.0003 0.0036 0.0100 0.1041
[2,] 0.0029 0.0354 0.0115 0.0851 0.0015 0.0080 0.0230 0.1053
[3,] 0.0033 0.0330 0.0099 0.0912 0.0019 0.0087 0.0187 0.0886
[4,] 0.0091 0.0409 0.0186 0.0428 0.0041 0.0183 0.0287 0.0592
[5,] 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
[6,] 0.0162 0.0801 0.0379 0.0501 0.0226 0.0683 0.0849 0.1802
England Panama Tunisia Colombia Japan Poland Senegal
[1,] 0.0730 0.0003 0.0031 0.0214 0.0002 0.0055 0.0029
[2,] 0.0865 0.0006 0.0060 0.0384 0.0017 0.0146 0.0075
[3,] 0.0786 0.0006 0.0057 0.0378 0.0014 0.0155 0.0074
[4,] 0.0695 0.0025 0.0140 0.0485 0.0050 0.0282 0.0179
[5,] 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
[6,] 0.1247 0.0047 0.0149 0.0617 0.0188 0.0607 0.0407
We see that the GLSE prediction gave France an 11.3% chance of winning. The predictions of winning the World Cup are easily compared by plotting the winning probabilites.
DF <- tibble(prob=100*c(GLSE[1,], ekstrom1[1,], ekstrom2[1,]),
country=rep(colnames(GLSE), times=3),
prediction=rep(c("GLSE", "Ekstrom1", "Ekstrom2"), each=32))
library("ggplot2")
library("magrittr")
DF %>%
ggplot(aes(x=reorder(country, -prob), y=prob, col=factor(prediction))) +
geom_point(size=3) +
scale_colour_manual(values = c("black", "#990000", "blue")) +
xlab("Country") + ylab("Prediction probability of winning the 2018 World Cup") + coord_flip()
To compute the log-loss scores we first need to combine the 32 partial ranks into the 7 categories by adding up the probabilities within each category for each country. Some matrix multiplication will do that for us. We set up a matrix that collapses the 32 ranks into the 7 categories (it will be added to a function later).
comb <- rbind(c(1, rep(0, 31)),
c(0, 1, rep(0, 30)),
c(0, 0, 1, rep(0, 29)),
c(0, 0, 0, 1, rep(0, 28)),
c(rep(0, 4), rep(1, 4), rep(0, 24)),
c(rep(0, 8), rep(1, 8), rep(0, 16)),
c(rep(0:1, times=c(16, 16)))
)
comb
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,] 1 0 0 0 0 0 0 0 0 0 0 0 0 0
[2,] 0 1 0 0 0 0 0 0 0 0 0 0 0 0
[3,] 0 0 1 0 0 0 0 0 0 0 0 0 0 0
[4,] 0 0 0 1 0 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 1 1 1 1 0 0 0 0 0 0
[6,] 0 0 0 0 0 0 0 0 1 1 1 1 1 1
[7,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26]
[1,] 0 0 0 0 0 0 0 0 0 0 0 0
[2,] 0 0 0 0 0 0 0 0 0 0 0 0
[3,] 0 0 0 0 0 0 0 0 0 0 0 0
[4,] 0 0 0 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0 0 0 0 0 0
[6,] 1 1 0 0 0 0 0 0 0 0 0 0
[7,] 0 0 1 1 1 1 1 1 1 1 1 1
[,27] [,28] [,29] [,30] [,31] [,32]
[1,] 0 0 0 0 0 0
[2,] 0 0 0 0 0 0
[3,] 0 0 0 0 0 0
[4,] 0 0 0 0 0 0
[5,] 0 0 0 0 0 0
[6,] 0 0 0 0 0 0
[7,] 1 1 1 1 1 1
comb %*% GLSE
Egypt Russia Saudi Arabia Uruguay Iran Morocco Portugal Spain
[1,] 0.0009 0.0005 0.0000 0.0117 0.0002 0.0038 0.0244 0.1776
[2,] 0.0020 0.0030 0.0003 0.0312 0.0013 0.0070 0.0481 0.1069
[3,] 0.0036 0.0044 0.0002 0.0342 0.0021 0.0082 0.0507 0.1264
[4,] 0.0118 0.0154 0.0033 0.0590 0.0059 0.0233 0.0740 0.0622
[5,] 0.0808 0.0807 0.0221 0.2428 0.0391 0.1122 0.2619 0.2568
[6,] 0.3567 0.4044 0.1445 0.4865 0.0819 0.1544 0.2189 0.1527
[7,] 0.5442 0.4916 0.8296 0.1346 0.8695 0.6911 0.3220 0.1174
Australia Denmark France Peru Argentina Croatia Iceland Nigeria Brazil
[1,] 0.0006 0.0167 0.1130 0.0052 0.0715 0.0209 0.0056 0.0008 0.1218
[2,] 0.0018 0.0289 0.0938 0.0160 0.0823 0.0392 0.0129 0.0026 0.0947
[3,] 0.0024 0.0325 0.1040 0.0145 0.0827 0.0405 0.0127 0.0022 0.0764
[4,] 0.0079 0.0431 0.0622 0.0279 0.0627 0.0583 0.0239 0.0071 0.0447
[5,] 0.0318 0.1355 0.1899 0.0862 0.2061 0.1469 0.0786 0.0286 0.1723
[6,] 0.1226 0.3353 0.2895 0.2387 0.3118 0.3495 0.2268 0.1258 0.3272
[7,] 0.8329 0.4080 0.1476 0.6115 0.1829 0.3447 0.6395 0.8329 0.1629
Costa Rica Switzerland Serbia Germany South Korea Mexico Sweden Belgium
[1,] 0.0005 0.0207 0.0046 0.1746 0.0003 0.0036 0.0100 0.1041
[2,] 0.0029 0.0354 0.0115 0.0851 0.0015 0.0080 0.0230 0.1053
[3,] 0.0033 0.0330 0.0099 0.0912 0.0019 0.0087 0.0187 0.0886
[4,] 0.0091 0.0409 0.0186 0.0428 0.0041 0.0183 0.0287 0.0592
[5,] 0.0483 0.1732 0.0932 0.1897 0.0331 0.1047 0.1370 0.2871
[6,] 0.1476 0.2876 0.2226 0.2829 0.1428 0.2713 0.3180 0.2152
[7,] 0.7883 0.4092 0.6396 0.1337 0.8163 0.5854 0.4646 0.1405
England Panama Tunisia Colombia Japan Poland Senegal
[1,] 0.0730 0.0003 0.0031 0.0214 0.0002 0.0055 0.0029
[2,] 0.0865 0.0006 0.0060 0.0384 0.0017 0.0146 0.0075
[3,] 0.0786 0.0006 0.0057 0.0378 0.0014 0.0155 0.0074
[4,] 0.0695 0.0025 0.0140 0.0485 0.0050 0.0282 0.0179
[5,] 0.2669 0.0202 0.0630 0.1899 0.0321 0.1206 0.0687
[6,] 0.2227 0.0872 0.1401 0.4574 0.1603 0.4201 0.2970
[7,] 0.2028 0.8886 0.7681 0.2066 0.7993 0.3955 0.5986
According to the prediction, Germany had around 40% chance of being in the top 4, and France had a 14.76% chance of not making it out of the group matches.
And the winner is …
Next we need to define a function to compute the log-loss. As input we accept a vector, trueranking
, which provides information on the true final results from the World Cup, and a \(32\times32\) matrix of predictions. The ranks will be condensed as described above.
Also, the log-loss scoring function can go bonkers if a prediction is zero. This could happen if either a prediction method sets the probability of an outcome to zero and the event then turned out to happen, or if the probability is so small that none of the simulations gave rise to that particular outcome. To fix that we tweak the minimum predicted probability to be 1/64 - one half the probability that would be used if each team had equal probability of winning the full tournament. This, however, is not a problem with any of the available predictions.
logloss_score <- function(trueranking,
matrix_prediction,
weights=1/c(1, 1, 2, 2,
rep(4, 4),
rep(8, 8),
rep(16, 16)),
threshold=1/64) {
## Minimal sanity checks
stopifnot(length(trueranking)==32)
stopifnot(NROW(matrix_prediction)==32)
stopifnot(NCOL(matrix_prediction)==32)
# Define the matrix used to condense the rankings to
# 7 categories
comb <- rbind(c(1, rep(0, 31)),
c(0, 1, rep(0, 30)),
c(0, 0, 1, rep(0, 29)),
c(0, 0, 0, 1, rep(0, 28)),
c(rep(0, 4), rep(1, 4), rep(0, 24)),
c(rep(0, 8), rep(1, 8), rep(0, 16)),
c(rep(0:1, times=c(16, 16)))
)
# Compute the condensed ranking probabilities
condensed_prediction_matrix <- comb %*% matrix_prediction
# This vector defines the categories corresponding to the
# order of the rankings given in rank_group
rank_group <- c(1, 2, 3, 4, rep(5, 4), rep(6, 8), rep(7, 16))
# Compute the weighted log loss score
sum(log(pmax(
condensed_prediction_matrix[cbind(rank_group, trueranking)],
threshold))*weights)
}
If all countries have exactly the same skill level then their final positions will have a uniform distribution and the weighted log-loss becomes
# Create matrix with equal probability of all positions for all teams
equalskill <- matrix(rep(1/32, 32*32), 32)
logloss_score(1:32, equalskill)
[1] -14.55609
Hopefully, the contestants are doing better than just guessing that all teams are equally good. Finally, we enter the actual outcome of the 2018 World Cup.
trueresult <- c("France", "Croatia", "Belgium", "England", # 4 best teams
"Russia", "Sweden", "Brazil", "Uruguay", # 5-8
"Colombia", "Switzerland", "Japan", "Mexico", # 9-
"Denmark", "Spain", "Portugal", "Argentina", # -16
"Panama", "Tunisia", "Senegal", "Poland", # 17-
"Costa Rica", "Serbia", "South Korea", "Germany",
"Iceland", "Nigeria", "Australia", "Peru",
"Iran", "Saudi Arabia", "Morocco", "Egypt") # -32
# Information about the teams from the function to create the predictions
# We only use that to get the column indices of the team names
team_data <- tibble(
number = 1:32,
name = c("Egypt","Russia","Saudi Arabia","Uruguay",
"Iran","Morocco","Portugal","Spain",
"Australia","Denmark","France","Peru",
"Argentina","Croatia","Iceland","Nigeria",
"Brazil","Costa Rica","Switzerland","Serbia",
"Germany","South Korea","Mexico","Sweden",
"Belgium","England","Panama","Tunisia",
"Colombia","Japan","Poland","Senegal")
)
# Get the column order of the results
trueranks <- match(trueresult, team_data$name)
# Compute the logloss and see who did the best
logloss_score(trueranks, GLSE)
[1] -11.69494
logloss_score(trueranks, ekstrom1)
[1] -11.72221
logloss_score(trueranks, ekstrom2)
[1] -13.47818
log.loss | |
---|---|
Groll, Ley, Schauberger, VanEetvelde | -11.69 |
Ekstrom (Skellam) | -11.72 |
Ekstrom (ELO) | -13.48 |
Random guessing | -14.56 |
And the winner is the prediction by Groll, Ley, Schauberger, VanEetvelde (although not by much). Well done! Time to prepare the prediction algorithms for the next tournament - and hopefully we can get more people to participate.