Schedule strength of remaining Premier League fixtures

Now we are very close to the end of the Premier League season, I thought I’d try to compare the difficulty of each team’s remaining Premier League fixtures. To do this I used a very simple method to work out how much easier or harder than average each team’s opponents are. Skip to the graph if you aren’t interested in the working out.

First I calculated the average points taken against each team so far, separated into home and away fixtures.

Then I took each team’s remaining fixtures and summed the point values taken from the first calculation. I then averaged this for each team. Finally I worked out the average points from all fixtures and took this away from the team level totals.

So this leaves me with a value that represents how much easier (positive values) or harder (negative values) a team’s fixtures are in points per fixture.

Anyway, after that rather rambling explanation, here are the results:


The numbers should give Liverpool fans title hopes a boost. As it turns out they have the easiest fixtures remaining of all 20 Permier League teams. City on the other hand can look forward to harder than average games. Liverpool’s opponents give up more than 0.6 points per game more than City’s.

At the other end of the table both Brighton and Cardiff have a tough set of games remaining, with Chris Hughton’s side facing the second hardest run in. Burnley will be very pleased they have pulled away in recent weeks, while Southampton look like should be safe.

As far as the race for third and fourth is concerned, there isn’t so much between them. Chelsea have the worst schedule, Tottenham’s is about average, while United and Arsenal have a fairly easy finish to the season.



Clustering Premier League Defensive Pressure

Statistical analysis of the defensive side of football is notoriously difficult. Tackle and interception counts are famously deceptive. One of the best ways of measuring a defence is from the affect it has on the opposing attack.

One fairly simple way of doing this is to look at opponents pass completion percentage. The more pressure the defence exerts on the ball, the harder it is for the opposition to pass the ball successfully. Continue reading

Using Neural Networks to calculate Expected Goals

Previously I created an Expected Goals model based on logistic regression.

I wanted to improve this model. Rather than add new features and work out how to include them in the regression equations, I decided a simpler way would be to use a Machine Learning algorithm to do it for me. So I decided to convert my model to use a Neural Network. Continue reading

Producing time series from a simple Expected Goals model

Aim: To produce time series from a simple Expected Goals model to help analyse the progress of football matches.

Expected Goals is a derived statistic that estimates the number of goals a team would score on average from its opportunities. It has become so widespread it now features on Match of the Day.

Unlike observed statistics like goals or shots, it depends on a model. These models can be incredibly sophisticated – see Michael Caley’s excellent work.

A couple of years ago I tried creating my own much simpler version. You can read about it here. I was mainly investigating using the data for prediction. As on Match of the Day, expected goals are most commonly used for analysis, i.e. describing what happened after the event.

I wanted to build a slightly modified version for analysing games myself. As far as categories go, out go headers (these are now lumped together with other shots), in come penalties and shots in the 6 yard box. So now the four categories of shot used in the model are:

  • penalties
  • shots in the 6 yard box
  • shots in the rest of the penalty box
  • shots outside the penalty box

Still pretty simple – unfortunately time and a lack of finer grained data prevent me from going much further.

Expected Goals over time

However, what I have added is the ability to record how cumulative expected goals build up over the course of a match.

Here is one example from last weeks Premier League:


The graph captures the ups and downs of a roller-coaster match quite well. Liverpool were slow to start, then dominated after half-time. They probably should have sealed the win but a late surge from Watford was enough to share the points.

I will try to publish more examples here and on my Spurs blog and use them as a tool in my own analysis of games. I hope to refine the model over time. Maybe put headers back, or split the locations into finer buckets. Unfortunately getting the data is the main barrier.

Note: the model was calibrated from Premier League data from the last 6 seasons from

Follow me on Twitter @ABPSpurs


Improving TSR with more detailed shot data

Total Shot Ratio (TSR) has been demonstrated to be a useful metric for judging a team’s performances and predicting future outcomes. It is both highly repeatable and correlates well with long term results.

Its big advantage is its simplicity, both to calculate and to understand. This also means it is easy to get the data for.

The big negative is that it assumes all shots are equal in value. A hopeful Andros Townsend punt from thirty yards is counted the same as a cultured Harry Kane curler from the penalty spot.

Recently, sophisticated Expected Goals models have been created to address this issue. However, these require a significant amount of time and effort to create and rely on the availability of very fine grained data. As the complexity increases, so does the leap of faith the reader must make in accepting the numbers.

Are there some simple ways we can adjust TSR in order to address some of its deficiencies? Continue reading

How can you adjust schedule strength for home / away games?

Last post I looked at a metric for comparing team’s schedules. I used this to calculate the points a hypothetical average team would get from the remaining fixtures of each of the twenty Premier League clubs’ run ins.

The problem was, I made no allowance for whether the matches were home or away. With 44% of games finishing in a win for the home team, set against 30% for the away side and 26% ending in a draw, location has a big impact on the likely result.

How can we revise the numbers to take this into account? Continue reading