Friday, December 9, 2011

Beating The Spread: Statistical Models of NFL Power Rankings and Point Spreads (Part 2)

by Tim Rubin

**Information on this site is collected from outside sources and/or is opinion and is offered "as is" without warranties of accuracy of any kind. Under no circumstances, and under no cause of action or legal theory, shall the owners, creators, associates or employees of this website be liable to you or any other person or entity for any direct, indirect, special, incidental, or consequential damages of any kind whatsoever. This information is not intended to be used for purposes of gambling, illegal or otherwise.**
________________________________________________________________________
.
In Part 1 of this series I did a general introduction to statistical modeling for team ranking systems.  In Part 2, I will be introducing a simple “Margin-of-Victory” style model for NFL game spreads.  After a brief discussion of the model, I will show the team power ratings that the model learns, as well as its predictions for Week 14’s games.*
.
* Saying that a "model learns" is something of a misnomer.  What this really means is that we infer the parameters of the model that best fit the data.  Since in this case, the parameters of the model include the "Rating" of each team, it can be helpful to just use shorthand and say that "the model learns the team ratings".
________________________________________________________________________
An Extremely Simple Margin-of-Victory Model for NFL Game Outcomes.

First, let me walk you through a simple Margin-of-Victory model for NFL game outcomes.  We can then look at some things that we can do with this model, such as generate NFL team power rankings, and predict the outcome of upcoming games.

Here, on one line, is the essence of the model:

PointsHomeTeam – PointsAwayTeam = RatingHomeTeam –  RatingAwayTeam + HomefieldAdvantage + Error  
                                                                            
Pretty simple.  On the left-hand side of the equation, we have the game outcome, expressed in terms of point differential.  A positive value on the left-hand side indicates a victory for the home team, while a negative value indicates a loss (i.e. a victory for the away team).  On the right-hand side of the equation, we have the model parameters.  The model parameters are the "Ratings" for both the home and away teams, and the “homefield advantage”.

In plain English, this model states that for any NFL game, the outcome (in terms of margin of victory or defeat for the home team) is equal to the "Rating" of the home-team, minus the "Rating" of the away-team, plus the homefield advantage.

Finally, the "+ Error" that I added on right side of the equation captures the amount of error made by our model’s predictions.  This indicates that the predictions made by our model will vary in their accuracy (and will almost never be exactly correct).  But this does not indicate that it is a bad model.

In any model for NFL point differentials—no matter how good it is—there will always be some error, simply because there is a large amount of randomness that goes into the final outcome of every NFL game.  Football is a high variance sport, plain and simple.  It is a sport in which a game can be decided on a play in which a quarterback somehow eludes a mob of 300 pound men swarming around him, wildly hurls the ball down the field to a mediocre receiver that will never catch another ball in his career, who manages to make one of the greatest catches in NFL history by trapping the ball against his helmet two inches off the ground. NFL games often turn on strange, lucky, and flukey plays.  That is a big part of what makes the NFL entertaining.  But—from the perspective of predicting outcomes—that is something we call variance.  And the NFL has a lot of it.

As a general rule: if a model for NFL games—or more generally, for anything involving human behavior—doesn’t include something equivalent to an error term, then something is wrong with that model (or the person who formulated the model is delusional).

With our model, it is easy to see why we will have to have some errors, when we think about fitting the model, in the next section.

_____________________________________________________________________
Fitting the model (intuitively)

Let’s think about how we would fit our model to the data, and what it even means to “fit” a model…

Note that, in the equation above, there are three model parameters: a “rating” for the home and away team, plus a value for the “home-field” advantage.  If we want to predict the outcome for a single game—between one home team and one away team—then we only need these three parameters.   However, to make this model apply to all games that are played in the NFL, we will actually need a “rating” for all 32 NFL teams, plus the value of the “homefield advantage”.  Once we have learned those parameters, we can then use the model to predict a point-spread for any possible matchup.

Ignoring the “Homefield Advantage” component for now, our model has 32 parameters: One “Rating” for each of the 32 NFL teams.  Given our 32 team ratings, our model makes a prediction for all of the games that have happened so far this season. 

So, suppose we start with a random numerical value assigned to each of our 32 ratings.  We could then go and look at what the predictions are for each of the game outcomes given our current parameter-values (i.e., ratings).

Example: Team Ratings and Game Predictions

To think about what this means, here is a table showing the first ten games played this season and the results (in terms of point differential).  To simplify things here, I will make that traditional assumption that homefield advantage is worth 3 points.
                
So we now only need to think of the simpler equation:

RatingHomeTeam –  RatingAwayTeam + 3 =  Point_Differential  +  Error  
                                                                            
For each of these games our model makes a prediction.  Specifically, it predicts that the value in column 3 will be equal to: the rating of the team in column 1 minus the rating of the team in column 2, plus 3.

For example, for the first game, if the Packers rating was set to +12, and the Saints was set to +5, the predicted outcome would be:  12 – 5 + 3 = 10.  That is, the model would predict the Packers to win by (+)10.  And since the true outcome was equal to 8, this would give us an absolute error of 2. 

By doing this for all of the games that have been played in the season, we can figure out how much total error our model makes, given our current team ratings.

Now, the process of “fitting our model” our model will involve adjusting each of the teams Ratings.  Imagine that you had a number line, and you had to place each team along that number line (thereby assigning them a rating).  “Fitting” our model would involve adjusting the team ratings up and down that number line until we minimized the sum of all the prediction errors across all of the games.  


There are some some options in terms of how we measure our prediction errors.  Minimizing the sum of the absolute-value of the errors corresponds to a method called “Least Absolute Deviations”.  More typically, people try to minimize the sum of squared errors (this method is called “Ordinary Least Squares).

I’m not going to delve too deeply into the difference between those two methods in this post (although there is some discussion of the issue here if you are interested).  However, it is worth noting that Ordinary Least Squares conforms to the assumption that errors have a normal distribution.  And it turns out that this is often a pretty reasonable assumption, which I will be using to fit our model this week.

Once we have fit our model (with each team being placed along the number-line in terms of their rating), we have a “power-ranking”, or an ordering of teams from best to worst, according to their rating.

We can now see why our model will have to have at least some error in its predictions.
_____________________________________________________________________
Why we need an Error term

Suppose that Team A were to beat Team B, and Team B were to beat Team C.  Then, for our model to have even a chance at getting zero total error, if Team A were to play Team C, it would have to win. 

In other words, strictly speaking, our model assumes that the “transitive property” holds, i.e., that if A>B, and B>C, then A>C.   (Note: to simplify the point I am making here, I am ignoring the fact that we wouldn’t always need transitivity to hold, due to the effect of “homefield advantage”).

However, this type of situation arises constantly in the NFL, and transitivity simply does not hold.

A recent example: 

- The 49ers beat the Seahawks in Week 1. 
- The Seahawks beat the Ravens in Week 10. 

If winning in the NFL were transitive, then the 49ers would have had to beat the Ravens on Thanksgiving.  This is because the outcome of the first two of these games suggested that 49ers > Seahawks > Ravens.  Sadly (for me) the 49ers did not beat the Ravens. 

Therefore, NFL outcomes are not transitive (and even if you don’t like this particular example, it’s easy to find plenty of others).  In fact, there is a website dedicated to this exact phenomenon for College Football.

To summarize this idea: when we fit our model, we will get a ranking of all teams according to their ratings, e.g.:
Team1 > Team32 > Team6 > Team12 …. > Team20

And unless that ranking can account for all outcomes across the entire season, we will need to have some error to allow for the fact that sometimes teams will lose to teams with worse ratings/rankings than their own.
                                                        
_______________________________________________________________________
Model Power-Ratings and Point-Spreads

I have fit the model (using Ordinary Least Squares) on season data through the end of week 13 (it doesn’t account for the Thursday game played last night).


The table below shows the model parameters, i.e., the 32 team ratings, and the homefield advantage.


As a general rule, it's a good thing to do a "sanity check" after you have fit any model.  One way to do this is simply by looking at the relative team ratings in the model, and see if these roughly correspond your opinion (or general consensus) about how good each team is.  So, I suggest that you go ahead and check out the table, and see what you think.  


As a benchmark for comparison of the model's rankings, you can check out the power rankings given by other websites, e.g., the Football-Outsiders Team Efficiency Ratings (which  are based on a very different system, but nevertheless come up with a roughly similar ranking of all the teams).  Another place worth checking is the Sagarin ratings, in particular because his "Pure Points" model is related  to the model that we are using. ** Update: I compared these ratings to Sagarin's, and they are very similar; for those that are interested, the correlation coefficient, R-Squared > .98  **

Another good sign for the model is that it has set the "homefield advantage" at 2.6, which is  reasonably close to the traditional value of 3 that people consider home-field to be worth in the NFL.  


Remember that, if we think about this table in terms of our model: The “ratings” and “homefield advantage” are simply the 33 parameters of our model.

Power Ratings / Model Parameters
We can use the parameters of the model from the table above to make a prediction for any NFL game.  Using our model, the predicted outcome for a game will be: The home team’s rating, minus the away team’s rating, plus the homefield advantage.  

The second table shows (1) our model’s predicted outcomes for every week 14 game, (2) the opening and current Vegas spreads, and (3) the average predictions made by a large number of statistical models, from the website nflpredictiontracker.com.  You can see that for most games, our model does sets a line reasonably close to the Vegas line.  An interesting exception is that for games with very high spreads, our model consistently predicts a larger margin of victory than the Vegas line.  In future posts, we will look at why this is.
Margin-of-Victory Model predictions, Vegas lines, and average predictions from thepredictiontracker.com
____________________________________________________________________
A Personal Disclaimer


Besides any legal ramifications, which one of my lawyer bosses hopefully took care of with that nifty disclaimer at the head of the post, I wanted to put out a genuine, personal disclaimer.


These posts are intended for educational purposes.  And I, in all honestly, do not suggest that you use our model’s predictions for gambling.  I would not trust these model’s predictions at this point.  If for no other reason than the fact that our model doesn't take into account many key factors, such as injuries (and note that two of the best teams in the league have lost their quarterback over the last few weeks, to be replaced by TJ Yates and Caleb Hanie). 

Additionally, this model simply isn’t ready for prime-time yet.  Although it does a reasonable job of estimating this week’s lines (using the Vegas line as a benchmark), there are many things we must do to improve this model.  Furthermore, I would never trust a model before doing proper testing of the model’s performance (which we have not done).
____________________________________________________________________
Looking Ahead

In this post, we covered our simple Margin-of-Victory model fairly superficially. 

In the next post, Part 3, we will look at this model in more detail.  We will also look at how this model allows us easily to go from point-spreads to win-probabilities (or, for the gambling addicts, moneyline odds). 

This stuff is extremely important, in particular because we need to get an estimate of just how much variance there is in our predictions.  And to do that, we will need to look at some of the underlying theory behind the model.

Further down the road, I will be showing how you can fit the model yourself in Excel, and we will start thinking about ways of improving the model.
____________________________________________________________________
One Final Note

If I haven't made it already clear, it's important to point out that I am not trying to claim that I invented this model.  Similar, and even equivalent models have been described elsewhere.  As mentioned before, the approach I described is similar to Jeff Saggarin's "pure points" model (although he hasn't released the exact details of his model).  The Pro Football Reference Blog discussed an even simpler version of this model back in 2007—appropriately dubbed the simple rating system.   And…buried within the vast depths of Microsoft's online help for Excel 2003 (I kid you not)…is a description of the same basic model that I describe here.  Presumably, if you were to type the question "Can I use Excel to Set NFL point Spreads?" into Excel 2003, you would be lead down this path. 

And, since I can't say it any better myself, I'll quote from the author of the Pro-Football-Reference blog:  "it’s not my system. I didn’t invent it. In fact, it’s one of those systems that has been around for so long that no one in particular is credited with having developed it (as far as I know anyway). People were almost certainly using it before I was born. I like the system…because it’s fairly easy to interpret and understand". 

In future posts, we will look at variants, or improvements to the model.  But for now, we will stick to this basic model because (a) it's a pretty solid model, with some history of usage to back it up, (b) it can be implemented in Excel, so anyone can play around with the model on their own, and (c)  its simplicity will be a virtue when we start to apply the model for other purposes (such as understanding win-probabilities, or more complicated bets such as multi-team teasers).

1 comment: