by Tim Rubin
**Information on this site is collected from outside sources and/or is opinion and is offered "as is" without warranties of accuracy of any kind. Under no circumstances, and under no cause of action or legal theory, shall the owners, creators, associates or employees of this website be liable to you or any other person or entity for any direct, indirect, special, incidental, or consequential damages of any kind whatsoever. This information is not intended to be used for purposes of gambling, illegal or otherwise.**
After we have looked under the hood of this model, we will take a step back in order to look at some of the pros and
cons of the model, as well as ways in which we can improve upon it.
_____________________________________________________________________
**Information on this site is collected from outside sources and/or is opinion and is offered "as is" without warranties of accuracy of any kind. Under no circumstances, and under no cause of action or legal theory, shall the owners, creators, associates or employees of this website be liable to you or any other person or entity for any direct, indirect, special, incidental, or consequential damages of any kind whatsoever. This information is not intended to be used for purposes of gambling, illegal or otherwise.**
_____________________________________________________________________
In Part 1 of this series I discussed
the prevalence of statistical models in ratings systems for team
strengths in things like Chess, Halo, etc. In Part 2, I introduced a simple “Margin-of-Victory” style model for NFL game
spreads, and I showed the power-ratings that this model learned, as well as
it’s predictions for the Week 14 games.
In Parts 3 and 4 (which will be posted today and tomorrow),
we will be looking at the Margin-Of-Victory model in a little more detail, so that we can
understand how this model can be used to predict (in addition to the margin of victory), the probability of different
outcomes (e.g., the probability that the home-team wins, both straight up and
against the spread). We will also be
looking back to see how accurate the model’s predictions were for last week’s
games.
_____________________________________________________________________
The simple Margin-of-Victory
Model (Brief Review)
In part 2 of this series, I introduced the simple Margin-of-Victory model we’ve been using. As a reminder, this is the basic idea of the model:
In part 2 of this series, I introduced the simple Margin-of-Victory model we’ve been using. As a reminder, this is the basic idea of the model:
Points_{HomeTeam} –
Points_{AwayTeam }= Rating_{HomeTeam }– Rating_{AwayTeam} +
HomefieldAdvantage + Error
When applying this model to the NFL, this model
consists of 33 parameters: A “rating”
for each of the 32 teams, and a “Homefield Advantage” which we assume to be
equal for all teams and games.
After fitting the model last week, I showed what
the model’s week 13 power-ratings were for each team, and the model’s prediction’s for week 14.^{1}^{}
Some Nice Features of this
model
At a later time, we will do a more critical analysis of this model, looking at the pros and cons of it, etc.
But for now, I just want to point out a couple of nice properties of
this model. In particular, the
“parameters values” for the model (i.e., each team’s Rating, and the “Homefield
Advantage”), have an extremely intuitive, real-world interpretation.
First, this model expresses team ratings on the same scale as the game score.
For example, on a neutral field (i.e., with no team having a home-field
advantage) a team with a +10 rating would be expected to beat a team with a +5
rating by 5 points. The value of the “Homefield-Advantage”
in the model—let’s call it h—has similarly
intuitive interpretation; it can be treated as adding h to the rating of the home-team.
A second nice feature is that we can arbitrarily
“shift” all team ratings up or down as we please, since the only thing that
matters is the difference between
team ratings. For example we could add
+100 to each teams’ rating, and it wouldn’t change our predictions. In fact, as I mentioned last week, the Sagarin “Pure Point” ratings
are extremely similar to this model’s ratings, except that he “shifts” all of
the team ratings so that the average rating is 20. To illustrate this, I’ve created a plot here,
in which I’ve aligned the Margin-of-Victory model’s ratings through week 13
with Sagarin’s week 13 ratings (by setting the average rating of our model to
equal 20). You can see in the plot that
the ratings generated by the two models are generally quite similar.
For our Margin-of-Victory model, I chose to set
the average rating to 0, since I think this leads to the most easily
interpreted set of ratings (by far).
Specifically: any team with a positive rating is better than average,
and any team with a negative rating is worse than average. The more positive
the rating, the better, and vice versa.
Even better, our ratings then have a real-world interpretation of their
meaning. Specifically, each team’s
rating corresponds to how we would expect them to perform vs. a league-average
team, on a neutral field (i.e., with no team having a home-field advantage). For example, using the power-ratings that
were fit last week: the Packer’s rating
of +12.2 means that they would beat an average team by about 12 points on a
neutral field (or by about 15 at home).
Which seems like a fairly reasonable statement.
Now that you hopefully have a pretty clear
intuition for how to interpret this model, let’s take a closer look at some of
the underlying details. If this sounds
intimidating, don’t worry. I will
literally make no assumptions about your background. If anything is unclear, feel free to leave a
comment and I’ll do my best to either give you an answer directly or clarify the
issue in the post itself.
___________________________________________________________________________
A good way to model the random variation in the performance of each team is using our old friend, the normal distribution. In other words: we assume that each team has some underlying “rating”, but their performance across games varies according to a normal distribution centered about their rating.
- The Standard Deviaton (σ) / Variance (σ^{2}): This controls the "spread" the distribution. As the standard deviation or variance of a distribution gets larger, it becomes much more likely to observe numbers that are further from the mean of the distribution.
.
.
The image below gives a nice feel for how likely different values are, given the parameters of a normal distribution: About 70% of all numbers generated by a normal distribution will be within 1 standard deviation (1 sigma) of the mean. About 95% of numbers will fall within 2 standard deviations, and nearly all numbers will fall within 3 standard deviations. This general idea is known sometimes as the 3-sigma rule.
To help visualize what it means for each team to “sample” their performance from a normal, it may be helpful to take a look at the Plinko-like machine (referenced to in the first part of this series) called the “Bean Machine”. The Bean Machine was created by Sir Charles Galton as a way to demonstrate a mechanical approach to approximate samples from a normal distribution:
As balls (or “beans”) are dropped into this
device, the frequency that they land in different bins approximates a normal distribution. Note that the
although normal distribution is continuous (it doesn’t just generate integers),
this machine has discrete bins, so it is in fact just a rough approximation of
a normal. However, note that on the
machine is an illustration of a smooth bell-curve like line, running through
the set of bins; this line illustrates the probability distribution of the
normal that the machine approximates.
Here’s a super high-speed video of one of these
machines in action.
1.) We have no estimate of how reliable our model is, for the purpose of gambling:
___________________________________________________________________________
The Margin-Of-Victory
Model, In Pictures
Let’s take a step back, and consider one key
thing: uncertainty. In our model, we
express uncertainty using the “error” term on the right side of the equation. But where does this error come from, and what
does it mean?
Now, one way to think about the model, is that
each team has a true rating
corresponding to how good the team is and that each team’s performance always
corresponds exactly to what their true rating is. And that any deviation in a game’s outcome
from what our model predicts (i.e., any deviation from our equation:
Points_{HomeTeam} – Points_{AwayTeam }=
Rating_{HomeTeam }– Rating_{AwayTeam} +
HomefieldAdvantage
is due to randomness inherent to football games. However, this is probably not the right way
to think about things. Although there
certainly is a lot of randomness in football games, the notion that each team always plays at the same rating-level
seems like a stretch. One good example
for why this idea is problematic relates to injuries: in the NFL, key players have
to sit out for a series, a game, or multiple multiple due to injury, all the time. And so each team’s starting roster is
constantly in flux. It seems highly
unlikely that with all that personnel change going on, the true rating of a team stays constant.
A more realistic way to think of the model is as
follows: each team has a true rating corresponding
to their team strength, but the level at which the team actually performs will fluctuate
from game to game (i.e., some of the time they play worse than their rating
indicates, some of the time they play better, but on average they play at the
level corresponding to their true rating).
Modeling Variations in a
Team’s Performance
A good way to model the random variation in the performance of each team is using our old friend, the normal distribution. In other words: we assume that each team has some underlying “rating”, but their performance across games varies according to a normal distribution centered about their rating.
.
The normal distribution can be specified using two
parameters: a mean (indicated using the greek letter mu, μ) and a standard
deviation, (indicated using the greek letter sigma, σ). It is often more convenient to talk about a normal distribution
in terms of its variance, which is simply the standard deviation squared, or σ^{2}.
.
Here's what the two parameters of the normal distribution do:
.
- The Mean (μ): This parameter Controls the "Location" of the distribution. The mean of a normal distribution is also it's median and mode (the peak of its curve). In our model, each team's rating is equal to the mean of the team's distribution.
.
.
- The Mean (μ): This parameter Controls the "Location" of the distribution. The mean of a normal distribution is also it's median and mode (the peak of its curve). In our model, each team's rating is equal to the mean of the team's distribution.
- The Standard Deviaton (σ) / Variance (σ^{2}): This controls the "spread" the distribution. As the standard deviation or variance of a distribution gets larger, it becomes much more likely to observe numbers that are further from the mean of the distribution.
.
The shorthand notation that is helpful in describing a normal distribution is: "normal(μ, σ^{2} )", which denotes a normal distribution a mean equal to μ and a variance of σ^{2}.
.
So for example, a normal (0, .1) is a normal distribution with a mean of 0 and variance of .1 (which is very small). A normal(0,.1) distribution will mostly generate numbers very close to zero, and almost all of the numbers will fall between -1 and 1. A normal with a mean of 0 and standard deviation of 100, on the other hand, will generate a huge range of numbers (very few of which will fall between -1 and 1, despite this range containing the "peak" of the distribution).
The image below gives a nice feel for how likely different values are, given the parameters of a normal distribution: About 70% of all numbers generated by a normal distribution will be within 1 standard deviation (1 sigma) of the mean. About 95% of numbers will fall within 2 standard deviations, and nearly all numbers will fall within 3 standard deviations. This general idea is known sometimes as the 3-sigma rule.
Probabilities of different values for a normal(μ,σ^{2}). Courtesy of Wikipedia, |
.
In our model, we have 32 team “ratings”, each of these corresponds to a specific team’s average performance across cames. Since we will model each team's performance across games using a normal distribution, and using each team's “rating” to describe the team's average (a.k.a. mean) performance, we will say that the “rating” for team i is μ_{i }.
In our model, we have 32 team “ratings”, each of these corresponds to a specific team’s average performance across cames. Since we will model each team's performance across games using a normal distribution, and using each team's “rating” to describe the team's average (a.k.a. mean) performance, we will say that the “rating” for team i is μ_{i }.
.
To write this using the shorthand notation above: we say
that for each game that the ith team, plays, they “sample” their
game-performance from a normal distribution with parameters: (μ_{i} , σ^{2}).
Don’t worry if this isn’t totally clear yet. The pictures below will help a lot in terms of
understanding this idea.
Bean Machine
Prognostication
After the AFL/NFL merger, Sir Galton's infamous Bean Machine Prognosticator simply became too cumbersome for practical use. |
To help visualize what it means for each team to “sample” their performance from a normal, it may be helpful to take a look at the Plinko-like machine (referenced to in the first part of this series) called the “Bean Machine”. The Bean Machine was created by Sir Charles Galton as a way to demonstrate a mechanical approach to approximate samples from a normal distribution:
The Bean Machine |
To go further with this visualization, we can
think of our model in terms of “Bean Machines”, since each team is represented
simply as a normal distribution. Each
team would have their own unique bean machine.
And the full model would be composed of 32 bean machines, each placed at
different locations corresponding to each team’s rating. To simulate games between two teams, you
would drop a bean into the machine for both teams, and compute the difference
in the outputs (you could even adjust the location of the machines to account
for homefield advantage).
_____________________________________________________________________
And Now for Something Completely
Different
If you are not used to thinking about things in
terms of probability distributions, this stuff can be tough to wrap your mind
around. In the next post, I’ll be giving
further details (and pictures).
But for now, let’s move away from the math, and
talk about evaluating model performance.
_____________________________________________________________________
Assessing Model
Performance
In terms of assessing modeling performance,
there are many things we can look at. I
am going to stick to the following two questions:
- How accurately did we predict
the margin of victory, compared to the Vegas line?
- How did our model do, if we
were to use it to pick against the spread?
Before we get into looking at results of our model's performance from last week,
here is some general food-for-thought, in terms of using models to beat the spread.
1.) We have no estimate of how reliable our model is, for the purpose of gambling:
Before using any model for real-world prediction, you
need to get an estimate of how well the system can make predictions. This is something that we can do, but it’s
not as simple as it may at first seem: You can’t simply look at how our current
model has performed on past games (because our model has already observed those
outcomes, so it would be cheating). You
actually would need to do something a little more complicated, along the lines
of “cross-validation”.
2 .) The model is not designed to beat the spread: the model we’ve been discussing is designed for two things:
estimating team strength, and predicting game outcomes with respect to
point-differentials. So, really, our
model is better thought of as a handicapping system (i.e., a system to set the spreads, not beat them).
And sure, you could use our model’s predictions to choose which team to bet on. And, in fact, you could do a lot worse (for
example, by always picking “public teams”^{*2}). And there are certainly dumber methods for picking teams (though I’m not sure I’d want challenge the
octopus in a picks contest, it kinda feels like picking against Tebow right
now).
And there’s a more fundamental point here. Let’s think about what our model’s input is,
what it does with this input, and what it’s output is:
- MODEL INPUT: For each game, the only
thing game data that the model ever sees is the home and away team names, and
their scoring differential. Our model
never sees what the Vegas line is.
- MODEL FITTING: The model takes this data,
and then optimizes the parameters with respect to team-strengths and homefield
advantage. It does not optimize anything with respect to the spreads.
- MODEL OUTPUT: The model output is a
rating for each team, and the value of homefield advantage.
At no point in this process is the model
touching, looking at, or even recognizing the existence of there being such a
thing as Vegas spreads. So, if we wanted
to use this model to pick against the spread, this model sure isn't going to do it on its own. And furthermore, it is probably not going to be optimized with respect to picking against the spread (and in fact, as discussed in part 2, it is optimized with respect to minimizing the squared-error of its predictions of game outcomes). So just because this is a good model, doesn't mean it's a good model for beating the spread.
3.) Ogres are like onions, and “models” are like Rainman. Statistical Modeling is
extremely powerful, and models can do amazing things. Models can do things like accurately predict
the outcome of elections, or predict the mass of elementary particles before they had even been observed (granted,
this second example comes from Physics, and doesn’t really apply or even belong
here. But it’s still pretty cool).
But even if a statistical model is accurate at predicting one thing, it does not necessarily mean that it is a good model to use for your intended purpose. To make a model that is well-suited for your own purposes, you first need to think
about what it is that you want to understand or predict.
You then need to formulate the model in such a way that it provides you with the information that you need. And this is where Rainman comes
in.
OK, so if you aren’t familiar with the movie
Rainman…well, I really have nothing to say to you other than that you should go
see it. It’s fantastic. It’s so good
that you forget all about the fact that Tom Cruise is completely insane (do I really need a link for this?
Probably not, but whatever).
Anyway, think of Tom Cruise’s character as “the
gambler” (young guy, looking for any edge, even if it involves taking advantage
of his autistic brother). And think of
Rainman (aforementioned autistic brother, who is an savant that can track every
card in a 6-deck shoe of blackjack), as “the statistical model”.
Now, if the Gambler were to give Rainman no
instructions, and sit him in front of the blackjack table with $10,000, Rainman would have no idea what to do, and would probably lose all of the Gambler's money. That is because Rainman doesn’t know what he’s supposed to do, unless you explicitly
tell him so. So instead, the Gambler
tells Rainman to simply keep track of the tens in the deck, and bet more money
when there are lots of tens in the deck (this is how card counters can get an
edge in blackjack). And by doing this—namely,
by explicitly informing Rainman what information he needs to provide—Rainman
and the Gambler are able to win lots of money in a classic 1980s musical montage.
The point of this little metaphor was this: if you want to use
statistical models to help you win money betting against the spread, you’d
better be making sure that you are not just blindly
using this model’s output to make bets.
For example, our Margin-of-Victory model gives us a totally reasonable
estimate of the outcome in terms of score-differential. Now let’s say it predicts a home-team victory
by 5 points, and the Vegas spread is 2.5. The question is: should we bet on the
home-team?
The answer, for now, is that we simply don’t know. The problem is that the model doesn’t give us
an estimate of the probability that the
home team will win by at least 3 points.
As of now, all we can say for sure is that model’s estimate suggests
that this bet would win more often than not (that is, the home-team had greater
than a 50% chance of winning by at least 3 points).
But that is not enough information to make money
on a bet. To really beat the spread in
Vegas (in the sense of turning a profit) you need your picks to win about 55%
of the time (because when you win, the house takes 1$ out of every 10$ of your
profit).
Now, it turns out that we can in fact use our model to estimate probabilities of specific
outcomes (such as the probability that a team will win by at least x points, or the probability that a team
will just flat-out win). And in Part 4,
we will be looking at just that.
_____________________________________________________________________
Stay Tuned for Part 4 of this series. In part 4, we will look at (1) how our model
performed last week, (2) some more pretty pictures, and (3) how to compute the
probability of different outcomes using the Margin-of-Victory Model.
_____________________________________________________________________
Footnotes:
.
*1 I don't want to
beat a dead horse hear, but remember: this model's predictions are predictions
about the line. They are not a system
for beating the spread, nor do they even (on their own) tell you anything about
which bets would give you a positive return on investment. So for now, unless you are in a weekly
competition to guess the lines with Bill Simmons on The B.S.
Report, I would hold off on using this to model to make any bets until we've had a chance to look a little more deeply at the model.
.
^{*2} I got the term “public
team” from Chad Millman. I highly recommend checking out some of his
articles, or tweets, about this idea (and more generally for his insight into
Vegas handicapping, and related topics).
He’s also a fairly frequent guest on Bill Simmons' podcast.
No comments:
Post a Comment