A common example of how one might use fitzRoy
is for creating a simple ELO rating system. These models are common for tippers that are part of The Squiggle and also becoming common in other team sports. This vignette shows a minimum working example to get you started on creating an ELO model from scratch, using fitzRoy
to get data and the elo
package to do the modelling.
First we need to grab a few packages. If you don’t have any of these, you’ll need to install them.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(elo)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
library(fitzRoy)
Our first job is to now get the relevant data. For the most basic of ELO models, we need to have the results of past matches that includes the home and away team and the score of the match. To do our predictions, we also need upcoming matches. We can get both of those things using fitzRoy
.
# Get data
results <- fitzRoy::get_match_results()
fixture <- fitzRoy::get_fixture(2019)
head(results)
#> # A tibble: 6 x 16
#> Game Date Round Home.Team Home.Goals Home.Behinds Home.Points
#> <dbl> <date> <chr> <chr> <int> <int> <int>
#> 1 1 1897-05-08 R1 Fitzroy 6 13 49
#> 2 2 1897-05-08 R1 Collingw… 5 11 41
#> 3 3 1897-05-08 R1 Geelong 3 6 24
#> 4 4 1897-05-08 R1 Sydney 3 9 27
#> 5 5 1897-05-15 R2 Sydney 6 4 40
#> 6 6 1897-05-15 R2 Essendon 4 6 30
#> # … with 9 more variables: Away.Team <chr>, Away.Goals <int>,
#> # Away.Behinds <int>, Away.Points <int>, Venue <chr>, Margin <int>,
#> # Season <dbl>, Round.Type <chr>, Round.Number <int>
head(fixture)
#> # A tibble: 6 x 7
#> Date Season Season.Game Round Home.Team Away.Team Venue
#> <dttm> <int> <int> <dbl> <chr> <chr> <chr>
#> 1 2019-03-21 19:25:00 2019 1 1 Carlton Richmond MCG
#> 2 2019-03-22 19:50:00 2019 1 1 Collingwo… Geelong MCG
#> 3 2019-03-23 13:45:00 2019 1 1 Melbourne Port Adel… MCG
#> 4 2019-03-23 16:05:00 2019 1 1 Adelaide Hawthorn Adela…
#> 5 2019-03-23 19:20:00 2019 1 1 Brisbane … West Coast Gabba
#> 6 2019-03-23 19:25:00 2019 1 1 Footscray Sydney Marve…
Before we create our model, some data preparation. In the ELO package we are using, we need a way to identify each round as a separate match, so we’ll combine season
and Round.Number
into a string as a unique identifier when combined with the team name. We also need a way to tell it when a new season is starting, so we’ll create a logical field that indicates if the game is the first game for a team that season.
results <- results %>%
mutate(seas_rnd = paste0(Season, ".", Round.Number),
First.Game = ifelse(Round.Number == 1, TRUE, FALSE))
head(results)
#> # A tibble: 6 x 18
#> Game Date Round Home.Team Home.Goals Home.Behinds Home.Points
#> <dbl> <date> <chr> <chr> <int> <int> <int>
#> 1 1 1897-05-08 R1 Fitzroy 6 13 49
#> 2 2 1897-05-08 R1 Collingw… 5 11 41
#> 3 3 1897-05-08 R1 Geelong 3 6 24
#> 4 4 1897-05-08 R1 Sydney 3 9 27
#> 5 5 1897-05-15 R2 Sydney 6 4 40
#> 6 6 1897-05-15 R2 Essendon 4 6 30
#> # … with 11 more variables: Away.Team <chr>, Away.Goals <int>,
#> # Away.Behinds <int>, Away.Points <int>, Venue <chr>, Margin <int>,
#> # Season <dbl>, Round.Type <chr>, Round.Number <int>, seas_rnd <chr>,
#> # First.Game <lgl>
For the fixture
data, we need to ensure the dates are in the same format as results
(note - this should probably be done internally in fitzRoy
- see #58). For now, we can do it manually.
There are a range of parameters that we can tweak and include in ELO model. Here we set some basic parameters - you can read a bit more on the PlusSixOne blog, which uses a similar method. For further reading, I strongly recommend checking out Matter of Stats or The Arc for great explainers on the types of parameters that could be included.
The original ELO models in chess use values of 0 for a loss, 1 for a win and 0.5 for a draw. Since we are adapting these for AFL and we want to use the margin rather than a binary outcome, we need to map our margin to a score between 0 and 1. You can do this in many varied and complex ways, but for now, I just normalise everything based on a margin of -80 to 80. Anything outside of this goes to the margins of 0 or 1.
We create that as a function and then use that function in our elo model.
Now we are ready to create our ELO ratings! We can use the elo.run
function from the elo
package for this. I won’t explain everything about what is going on here - you can read all about it at the package vignette - but in general, we provide a function that indicates what is included in our model, as well as some model parameters.
# Run ELO
elo.data <- elo.run(
map_margin_to_outcome(Home.Points - Away.Points) ~
adjust(Home.Team, HGA) +
Away.Team +
group(seas_rnd) +
regress(First.Game, 1500, carryOver),
k = k_val,
data = results
)
Now that is run, we can view our results. The elo
package provides various ways to do this.
Firstly, using as.data.frame
we can view the predicted and actual result of each game. Also in this table is the change in ELO rating for the home and away side. See below for the last few games of 2018.
as.data.frame(elo.data) %>% tail()
#> team.A team.B p.A wins.A update elo.A
#> 15609 Richmond Brisbane Lions 0.5360948 0.79375 5.1531041 1527.667
#> 15610 Geelong West Coast 0.5656360 0.62500 1.1872801 1545.196
#> 15611 GWS Brisbane Lions 0.5395092 0.51875 -0.4151833 1519.332
#> 15612 Richmond Geelong 0.5179394 0.61875 2.0162114 1529.683
#> 15613 GWS Collingwood 0.5326803 0.52500 -0.1536069 1519.178
#> 15614 Richmond GWS 0.5580287 1.00000 8.8394252 1538.523
#> elo.B
#> 15609 1522.236
#> 15610 1526.948
#> 15611 1522.651
#> 15612 1543.180
#> 15613 1526.744
#> 15614 1510.339
We can specifically focus on how each team’s rating changes over time using as.matrix
. Again - viewing the end of 2018 also shows teams that didn’t make the finals have the same ELO as the rounds go on since they aren’t playing finals.
as.matrix(elo.data) %>% tail()
#> Adelaide Brisbane Lions Carlton Collingwood Essendon Fitzroy
#> [2803,] 1502.335 1530.203 1481.946 1525.998 1495.482 1500
#> [2804,] 1499.259 1527.389 1475.897 1525.649 1495.832 1500
#> [2805,] 1499.259 1522.236 1475.897 1526.591 1490.584 1500
#> [2806,] 1499.259 1522.651 1475.897 1526.591 1490.584 1500
#> [2807,] 1499.259 1522.651 1475.897 1526.744 1490.584 1500
#> [2808,] 1499.259 1522.651 1475.897 1526.744 1490.584 1500
#> Footscray Fremantle GWS Geelong Gold Coast Hawthorn
#> [2803,] 1513.308 1483.509 1505.810 1538.903 1421.852 1509.981
#> [2804,] 1516.384 1479.572 1513.269 1544.951 1414.392 1516.133
#> [2805,] 1509.906 1479.572 1519.747 1544.009 1414.392 1516.133
#> [2806,] 1509.906 1479.572 1519.332 1545.196 1414.392 1516.133
#> [2807,] 1509.906 1479.572 1519.178 1543.180 1414.392 1516.133
#> [2808,] 1509.906 1479.572 1510.339 1543.180 1414.392 1516.133
#> Melbourne North Melbourne Port Adelaide Richmond St Kilda Sydney
#> [2803,] 1462.046 1508.088 1503.824 1519.699 1471.874 1496.103
#> [2804,] 1463.576 1506.558 1507.761 1522.514 1467.797 1500.180
#> [2805,] 1463.576 1506.558 1507.761 1527.667 1467.797 1500.180
#> [2806,] 1463.576 1506.558 1507.761 1527.667 1467.797 1500.180
#> [2807,] 1463.576 1506.558 1507.761 1529.683 1467.797 1500.180
#> [2808,] 1463.576 1506.558 1507.761 1538.523 1467.797 1500.180
#> University West Coast
#> [2803,] 1500 1529.041
#> [2804,] 1500 1522.888
#> [2805,] 1500 1528.135
#> [2806,] 1500 1526.948
#> [2807,] 1500 1526.948
#> [2808,] 1500 1526.948
Lastly, we can check the final ELO ratings of each team at the end of our data using final.elos
(here - up to end of 2018).
final.elos(elo.data)
#> Adelaide Brisbane Lions Carlton Collingwood
#> 1499.259 1522.651 1475.897 1526.744
#> Essendon Fitzroy Footscray Fremantle
#> 1490.584 1380.902 1509.906 1479.572
#> GWS Geelong Gold Coast Hawthorn
#> 1510.339 1543.180 1414.392 1516.133
#> Melbourne North Melbourne Port Adelaide Richmond
#> 1463.576 1506.558 1507.761 1538.523
#> St Kilda Sydney University West Coast
#> 1467.797 1500.180 1412.936 1526.948
We could keep tweaking our parameters until we are happy. Ideally we’d have a training and test set and be using some kind of cost function to optimise these values on like a log likelihood, mean absolute margin or something similar. I’ll leave that as beyond the scope of this vignette though and assume we are happy with these parameters.
Now we’ve got our ELO model and are happy with our parameters, we can do some predictions! For this, we just need to use our fixture and the prediction
function with our ELO model as an input. The elo
package takes care of the result.
fixture <- fixture %>%
mutate(Prob = predict(elo.data, newdata = fixture))
head(fixture)
#> # A tibble: 1 x 8
#> Date Season Season.Game Round.Number Home.Team Away.Team Venue
#> <date> <int> <int> <dbl> <chr> <chr> <chr>
#> 1 2019-09-28 2019 1 27 Richmond GWS MCG
#> # … with 1 more variable: Prob <dbl>
From here - you could turn these probabilities back into a margin through another mapping function. Again - I’ll leave that for the reader to decide.
Looking forward to seeing all the new models utilising the power of fitzRoy.