Yes, another postgrad who thinks he knows a thing or two about football and statistics come to jump on the analytics moneyball merry-go-round, hoping to one day be in charge of all Liverpool’s transfers – you caught me.
Disclaimer – I don’t reeeeaaallly have an idea what I’m doing, statistically speaking. And on that reassuring note, lets begin.
Over the last few days I have been trying to ride the crest of the defensive metric wave, the increased attention paid to the side of the game generally considered harder to measure than attack. Many really interesting pieces have been written recently; I won’t link them as you’ve already read them, right? Metrics like passes per defensive action (PPDA), passes per shot against (PPS) and save % are excellent (oh go on then). I, blindly, wanted to combine these metrics with SoTR and an *extremely* crude xG model. Could these 5 base-metrics be combined to create one ‘super-metric’ that scores a teams skill/ability? FitbaFancyStats did something very similar here. I wanted my own.
Using the 14/15 Premier League as my blind, unwitting Guinea Pig, I created the 5 metrics (replacing PPDA for danger zone shot %). Soon enough it became clear that the data were all measured in different scales – they needed to be normalised. The process of feature scaling/min-max scaling/normalisation scales each data point to a range between 0-1. Now, the worst team in the league for, say, SoTR, scores a big fat zero (Sunderland), and the best (Man City) scores 1. Great. Add these together to create my super-metric and I’m done…? Lets see:
A simple correlation of the ‘super-metric’ (SUM) and final league position is a decent start – a highly significant p-value shows…significance? It’s been almost 6 months since I left uni OK?! At least the best teams are near the top and QPR are at the bottom. Stoke, Palace and Tottenham overperformed whilst City and Southampton are underperformers. An adjusted-R2 of 62% is good.
I was pleased with myself initially, smugly grinning to myself over a lunchtime bacon sandwich. Until I remembered that SoTR itself is supposedly highly correlated with final standings. Quick, check it:
FFS. Sure enough, SoTR is highly significant on its own, with an R2 of 71.6% – more than my super-metric! None of the other 4 metrics have such high explanatory power for final league standings. Thus, most of the explanatory power in ‘super-metric’ Mark I. is coming from SoTR. Surely it is not representative to give SoTR the same weighting as the other 4, less powerful metrics.
I decided to play around with weights. This is where my method gets very, I don’t know, crude/simple/amateurish etcetc. Trial and error was my gameplan, initially randomly attaching weights in excel to the normalised SoTR metric. The effect this has is to further punish those teams with low SoTR scores (0*10=0), and boosting the teams with the highest scores (1*10=10). After playing around with different weights, I settled on 6.3. Now my super-metric looked something like this (remember, all metrics have been normalised to range between 0-1):
(SoTR*6.3) + Save% + PPS + DZshot% + xG = SUM
Its correlation with the league table looks like this:
You can see the SUM is much higher now for the top teams compared to before with no weightings attached (Fig. 1). Still highly significant. However the R2 has risen to 75.4%. All that has changed is SoTR has been given more influence. R2 falls if the weight of 6.3 is increased or decreased, thus 6.3 is deemed optimal for maximising R2 – whether this should be my aim or not, I genuinely don’t know.
I also experimented with attaching weights to the other variables – the only metric that marginally improved R2 when weighted was save %, from 75.4% to 77.7%. No overall model improvement when weighting xG, apparently the hallowed metric doesn’t carry the same explanatory power as SoTR or save %, although I expect this is because of how simply calculated it is in my model.
The maximum R2 I can achieve from all possible combinations of the 5 base-metrics is by combining just weightedSoTR and Save%, giving an R2 of 80.2%. According to this, that is a more accurate describer of team ability than my ‘super-metric’ – the one with all 5 metrics included. Which is more useful – a metric with the highest possible R2 but made up of only 2 metrics, or one that includes all 5 genuinely valuable and descriptive in their own right metrics like those used presently?
Like I said, I don’t really know what I’m doing.