The Problem with xG Philosophy

Elias
12 min readJun 13, 2021

--

On the second of March, I changed my profile picture, my @, my bio etc and posted a series of tweets parodying the popular (depending on who you associate with) twitter account ‘The xG Philosophy’. If you follow me on twitter (now @spagyama once more), you most likely did so in the aftermath of that tweet. I’ve never had any conversation with James Tippett or anyone else who may be involved with the running of the account. I haven’t even read either of his books. So what compelled me to do this?

To be blunt, I feel that the way xG Philosophy is run and the kind of things they choose to tweet represent a perversion of the metric that does more harm than good with regard to its understanding among casual football fans. After initially coming across the account, I found it to be a neat little way of checking xG for matches immediately after they finish (they tweet form very quickly for maximum interactions) and occasionally firing off a tongue in cheek tweet about a team/manager/player being bad based on the xG of whatever match it was from. I considered this to be pretty harmless, because I know that single game xG is relatively meaningless and there are no strong conclusions to be drawn from it. I’m unsure that the following accumulated by xG Philosophy is aware of this. James Tippett, apparently qualified to write an entire book about xG, should definitely have an awareness of the limitations of such a small sample size. As referenced in my original imitation tweet, they then had around 140k followers. At the time of writing, The xG Philosophy’s main account sits at 193k followers, and this growth shows no sign of slowing down. They play a key role in transporting expected goals and by extension football analytics as a whole into mainstream discourse, and so I feel that the many shortcomings of their account are damaging to the wider perception of xG across all of football.

While I was looking for an explanation for the methodology & sample size used in whatever model it is that xG philosophy gets their numbers from (and finding very little, but we’ll get to that later) I came across a thread from 2018 on Tippett’s personal account detailing his own explanation of xG.

In this thread, he explains that traditional football stats such as possession, shots & shots on target are ‘descriptive rather than predictive’ and that ‘a more powerful metric is needed to assess footballing performance’. He neglects to mention the fact that shots & shots on target were a precursor to xG which built on those foundations but fine, whatever, that’s not the issue here. Let’s focus on the point that expected goals, as a metric, are intended to be predictive. We want them to help us anticipate what will happen next, not just add more detail to what happened.

So, how does the primary source of content for the xG Philosophy account – tweeting single match xG tallies (sometimes single half or even single shot) show off the predictive qualities of xG? In short, it doesn’t. This is no different to the way mainstream media outlets such as Sky, BT and the BBC have used xG in their football coverage, where it sits right below the traditional stats, and is rarely discussed in any detail. How can you hail xG for being a powerful predictive metric, and then run an entire account which reduces it to an accounting of quality & quantity of shots in a game that has already unfolded? With no other context, you’re better off simply looking at shots & shots on target.

Now that I’ve expressed my issue with the main purpose of the account itself, let’s take a deeper look into the different kinds of tweets they put out. As mentioned, the most common of these is the xG scoreline for whichever Premier League match has just finished. The most immediately obvious issue with their xG scorelines is that they do not separate penalties. We’ll come back to this in a moment, but penalties have a very large xG (around 0.76) and creating them is not a repeatable skill. If you’re going to extract any value from single game xG, removing penalties (eg 1.3xG + 1 pen instead of 2.06xG) is an obvious place to start.

Now I’d like to further expand on the inherent worthlessness of single game xG. First of all, it has no value as a predictive metric. Each individual football match is different from all others; you will very rarely have two comparable matches in terms of the players involved, their ability at that time, their fitness levels and the tactical instructions they are being given. The uniqueness of football means that you cannot use one match to predict another. You cannot escape the fact that xG Philosophy’s content reduces expected goals to being nothing more than descriptive. Protest as he might, Tippett is contributing to expected goals having little mainstream usage other than sitting below the aforementioned possession, shots, shots on target etc in the stats column.

So, you might say, maybe this isn’t ideal, but it’s still a good indicator of how much each team created and who the better team was, right? After all, these tweets still give fans some opportunity to look beyond the scoreline to see how the game played out, don’t they? Well, no, not really. Expected goals are best used over a larger sample size. Over a period of, let’s say, 10 matches, unusual and unsustainable things can happen with regards to how many goals a certain team is scoring or conceding. The attackers are getting shots off from good goal scoring positions, but they just can’t seem to find the net. Every other shot they give up seems to fly into the top corner. Over the course of a season, these things tend to even themselves out. There are some outliers of course (yes, Brighton, I’m talking about you) but xG and xGA can help us to know if teams are better than their results, or, conversely, if a side might not be as good as the table suggests. When you bring down the sample size, xG suffers from the same problem that actual goals do; strange things happen. A striker misses the ball when faced with an empty net. A defender slips when clearing a ball, leaving an opposition player one on one with the keeper. As I stated earlier, penalties too are examples of abnormal and inconsistent events which are not truly reflective of the ability of a team to create chances & prevent the opposition from doing so, and xG Philosophy does not exclude penalties from their single match xG totals. A larger sample gives a better indication as to whether the chances a team is creating are repeatable and sustainable. Essentially, the smaller the sample size, the more variance that sample suffers from.

Variance isn’t the only thing to account for within single game xG. A match will go through different game states; if one team is leading, they are more likely to ‘kill the game’ and sacrifice their own creativity in order to also prevent the opposition from creating chances. For example, a team that is leading might try and hold onto the ball rather than progressing it up the pitch, and will generally take fewer risks. If the goal(s) they scored happened to come from lower xG chances, it may create the false impression that the leading team were simply lucky and do not have the capability to create a large number of chances while in reality, they didn’t really need to on that particular occasion. In addition to this, one team creating 1.8xG might be completely different to another team creating 1.8xG. Team A might have had three shots worth 0.6xG each, while Team B might have had 20 shots each worth 0.09xG. The figure 1.8xG represents the average number of goals that will be scored from these shots. Team A will achieve much more consistent results, while Team B will be far more inconsistent. However, Team A is only taking 3 shots and so 3 goals is their limit, while Team B will have the potential to score more than 3, all the way up to theoretically being able to score from each of their 20 shots.

Even if xG philosophy were to separate penalties and we accept that the only purpose of these tweets is to give some statistical context for a match that’s already occurred, their content is still so limited. A simple scoreline doesn’t even compare to the visualisations many other twitter accounts such as @Jmoorequakes are putting out.

As is the case with all single game xG based content, this does not provide any predictive value, nor is it trying to. Instead, Jamon incorporates a few helpful visualisations to provide a greater insight into this specific game. We are analysing the past, not predicting the future, and we are given as many tools as possible to analyse the balance of chances in this game. The cumulative xG step chart gives an insight into the periods in the game in which each side was creating a good number of chances; midway through the first half for LA Galaxy, and midway through the second for the San Jose Earthquakes. I explained earlier that two identical xG values can be wildly different beneath the surface, and the goal probability percentages on the right hand side are a perfect visual representative for this. There were a lot of shots in this game (Galaxy 21–18 SJE) so the outcome will be highly variable, and the Earthquakes can count themselves unlucky to not have scored a single one of their shots. In fact, neither team did; the only goal in the game was an own goal from Tanner Beason in the 70th minute. The bar at the top giving percentage likelihood’s for each result accompanied by an xPoints bar is based on the outcomes of 100,000 computer simulations of the shots in this game. A San Jose win was the most likely outcome but given the small disparity between the two xG values, they are not the overwhelming statistical favourites here. The relatively low % chance of a draw is likely a result of the high shot numbers & xG values each team racked up; as touched on before, higher shot numbers tend to make matches more variable and unpredictable.

The xG philosophy account might post much quicker and more consistently than their competition, but there is clearly so much more insight to be found elsewhere.

Moving back on to their tweets, from time to time, they will post an expected points graph for the Premier League. Putting my preference for using expected goal difference aside, these graphs can be incredibly misleading because they fail to account for or even mention the fact that different teams may have played different numbers of matches at certain points. This was particularly prominent this season due to fixture cancellations/rescheduling, largely an impact of the Covid 19 pandemic. It would not be difficult to use expected points per game instead, or at the very least add a disclaimer.

On the 9th February there were two teams on 21 matches played (Aston Villa & Everton), 6 on 22 played and the remaining 12 teams had each played 23, but this is not accounted for.

I’d also like to cover the tweets that I’m going to simply label as clickbait. These include useless nitpicked stats, single player v entire team xG scorelines and single shot xG values. A serious analytical account would not earnestly tweet this:

To quote my own response to this tweet, “there are so many more interesting and actually useful things you can do with analytics than saying ‘Bruno Fernandes has taken 9 penalties this season’ but with extra steps.”

Now, this is where I go from trying my best to be constructive and critical to being plain pissed off. This tweet is nothing more than nonsense designed purely as bait for fans to mock Bruno for his goals predominantly being penalties, or perhaps to mock Palace for being a bad attacking side. There is no way to earnestly criticise these tweets for their lack of insight, because they aren’t even trying to provide any. I would contend that the target demographic of xG Philosophy is casual football fans, not those who have a genuine interest in analytics, and these types of tweets appeal to them in a much more transparent way. Any kind of outlier xG Philosophy can find will be woven into a tweet to bait clicks, especially if it involves a big name player/club. If a player scores a spectacular goal, you can be sure that they’ll tweet out the xG value for a healthy number of likes and retweets.

Shots from a long way out have a very low chance of going in? Wow, thank you for this cutting edge analysis, this account teaches me so much. I could go on about these tweets but I’m sure you have better things to do than read angry incoherent rambling so I’ll move on.

The last point I’d like to make regarding xG philosophy’s content applies to all of their tweets; there is no public information on the xG model or sample size that they get their numbers from. I raise this issue not because I believe the numbers they post to be inaccurate or farcical, but due to the fact that you cannot assess whether they are or not. While searching for explanations of the model they use and how it works, the only thing I could find was the odd thread giving general information about how xG deals with certain issues such as consecutive shots occurring in quick succession.

This thread on consecutive shots in quick succession explains that xG deals with this by ‘calculating the probability that the attack DOESN’T result in a goal’ and they give an example of Raheem Sterling having his penalty saved and then immediately scoring the rebound. They have the value of a penalty at 0.78xG (they have also asserted it to be 0.77xG, and the most common value for a penalty is 0.76xG, which casts further doubt over their model) and the value of the rebound shot to be 0.9xG. Using these values, you can calculate that the probability of neither of them going in is 2%, so Man City should supposedly be attributed 0.98xG in total for these two shots.

We can probably assume that this is an explanation for the model Tippett himself uses. However, it is asserted that this is how xG in general deals with these situations, in spite of the fact that other models, such as the one used by Michael Caley (@Caley_Graphics), would exclude the xG of the rebound altogether.

You could make an argument as to which methodology is superior here, but the point I’m trying to make is that there are countless issues that all xG models face which they account for in different ways. This is why different xG models will return different results for the same match. It would be helpful to know any limitations this model has that you can take into account while looking at the numbers it produces. I believe that the lack of total transparency here only has the effect of confusing casual fans and may well be covering up for deficiencies that may be present within the model.

At best, James Tippett has a genuine passion for analytics in football and his account functions simply to get the word out by any means necessary, even if this means distorting what xG is actually good for, and also helps him to flog his book. At worst, he is a grifter who cares only about clout and making money, is purposefully trying to hide deficiencies present in his model, and will tweet absolutely anything to get good numbers with no regard for the damage he is causing for the many talented people that have worked incredibly hard to push the boundaries of what stats can show us in football. He has a tendency to mass block all accounts that criticise or deride him in any way, and gives no response to criticism unless it threatens to harm his image and popularity.

Finally, after months of promising to do it, I’ve actually written this thing. It feels good to write up a constructive and critical look at xG Philosophy to explain my dislike of the account and show those who follow them to be wary of the content they are consuming. Thanks for reading my first ever article! I plan to make this a regular thing if there’s enough interest in it. Thank you to Jamon Moore (@JmooreQuakes) for providing the LA Galaxy v San Jose Earthquakes visualisation used in this article; if you want to learn more about the analytics behind scoring goals, check out his Where Goals Come From series.

Last but certainly not least, a huge thank you to Nandy (@nandy_sd) for helping with editing this article and providing me with the visualisations I used to better explain the limitations of single game xG & how two identical numbers can look very different beneath the surface. Give him a follow if you haven’t already.

--

--