Previously I have built a very simple expected goals model based on four buckets for shots – six yard box, penalty area, outside the area, penalties. This is an improvement on pure shot numbers but still fairly crude. Here I describe my attempts to refine the model.

Note: many people have already tried similar concepts, this is not new. I am just trying to produce my own model to aid analysis of games.

**Improving the model**

One of the constraints on any model is the data. I have obtained more granular data and used it to build two more sophisticated models.

I now have more detailed position data, whether an attempt is a header or not, whether it is a direct free kick, and whether it came from a cross or not.

The two families of model are broadly:

1. Probability based

This is just an extension of my original model but with more buckets.

2. Logistic Regression

Here I have encoded a shots location through the distance from goal and the angle. I have then used logistic regression to build a parameterised probability model. I separated out headers from other shots and regressed these separately.

I used data from shots from the last six years to find the probabilities / parameters.

**Testing the model**

To test these two models to find which best describes a game I decided to test how well they correlated with actual goals over a set of games.

I split each team’s games for each season into chunks of six. This gave 720 groups of six games. I then tested the linear correlation between the expected goals numbers for those games and the actual goal numbers.

I also compared compared points, both with a ratio of expected goals (xG for / (xG for + xG against)), and an Expected Points measure based on Monte Carlo simulations of the shot probabilities.

Here are the R Squared values – I have included pure shot numbers for comparison:

The regression based Expected Goals model outperforms the probability based measure in all categories, although not by much.

Both xG measures perform significantly better than shots.

**Conclusions**

Both versions of Expected Goals give a better description of a match than simple shot numbers. The linear regression version is marginally better than the pure probability version.

**Further work**

Next, I would like to test out the predictive capabilities of these measures.

There is plenty of scope for improving the model – taking into account score effects, the coordinates of the pass that set up the chance etc. I would also like to try replacing logistic regression with a Neural Network based model.

Follow me on twitter for statistical tweets – ABPNumbers

[…] have created an Expected Goals measure to analyse games (see here for an explanation). This can be used to make an estimate of how many points you would expect a team to have earned […]

LikeLike

[…] Previously I created an Expected Goals model based on logistic regression. […]

LikeLike