tbyers11 replied with a great link. You may also be interested in an
earlier post of mine.
It seems like I often see things differently than others, so, in case it helps, I'll take the time to attempt an overly wordy answer/viewpoint to your question. I could be horribly wrong, but here's what I think:
First, there is a concept of primary importance which must be discussed. In my understanding,
the given percentages (eg Kentucky has a 99.999% chance of winning this game)
are NOT truly predictions of Team A's chance to win the game. Rather, they are the probability that the particular model accurately predicts which team will win the game. Those are certainly related concepts, and the media and even the model makers sometimes equate the two. But, they are not equivalent. (More on this later)
How are probabilities of winning even calculated?? IDK, but here's a guess based on things I've read.
Here's a simplified path of how I *think* the probabilities are established:
1. Calculate/assign each team a rating/ranking
2. Predict the winner of the game. I suspect that, for most models, this is the same as "Predict that the team with the higher rating/ranking wins the game." However, some models could include additional tricks.
3. Open up your system's modeled or historical data. Look at the data for which the teams had similar ratings/rankings to the teams in the current game. For what percentage of those games did this model accurately predict the winner?
This step is likely decently complex. But, for illustrative purposes, here is a quick graph of one way the modeled/historical data could look. For this basic predictive model, the winner is predicted to be the team with the higher rank and the independent variable is the difference between teams' ranks. Let's say Team A is ranked #6 and Team B is ranked #36 (a difference of 30 units). We look at the compiled data to see that our model is right only about 75% of the time when we say that the higher ranked team will win.
QuickProb copy.jpg
4. Tell everyone, "In modeling/historical data similar to Team A's and Team B's rating/ranking, we are correct 75% of the time when predicting that Team A is the winner." Or, "According to our model, team A has an 75% chance of winning." Or, just "Team A has an 75% chance of winning."
How are a model's predictions validated?
This gets more to your question about what the performance standards for a model should be. But, before talking about the tournament, let's consider the full season. How do these predictive models evaluate their "Percent chance of winning" performance over the season. From what I can piece together, the evaluation is along the lines of "with regard to the games the model
predicted as having a specific probability, what was the model's
actual accuracy in determining the winner?"
For example, we could look at all the games for which we predicted the favored team had a 50-55% "chance of winning" (In the graph above, this is essentially the same as looking at all the games in which the difference in teams' rankings was something like between 1 and 13 units - because those are the games in which we gave the favored team a 50-55% chance based on our modeled or historical data). So, how frequently did our model determine the correct winner? If it determined correctly somewhere around 50-55%, we would conclude that our prediction is valid.
Another example, in games we say we can predict the winner with 80-85% certainty, we should expect that we accurately determined the winner 80-85% of the time.
Sooooo, the models are not faulty for only predicting the winner 80-85% of the time in those games. Indeed, the models have done exactly what they predicted!...just not what we
wanted them to do
What should the performance standard be for predicting the winner of the tournament?
Finally, on to your question:
Well,
these models were NOT built to predict the winner of the NCAA tournament. As mentioned above, they are indeed doing what they are intended to do...just not what we want them to do or what is being pushed upon them. The models predict individual games, not the tournament, and not the Champion. The probability of compound events (each individual game) is used to predict who the champion will be. Thus, the test to see if the model does what it is intended to do should not be how frequently the model predicts the Champion, but, rather, how well the model predicts individual games. It does not necessarily matter that 2014 UConn kept winning its way to a
Championship despite having, say, a "15% chance" each game. What matters to the model is if, in
all the games that the model gave the higher ranked team an 85% chance of winninng, did the higher ranked team win 85% of the time?
I *think* the models are doing what they are designed to do and are probably meeting
that performance standard - they do not treat the Championship game as distinct from all the other games with similar opponents. Could a model be designed specifically for the tournament? IDK. Would it have to take into account the specific rounds of the tournament? Maybe not. Maybe the current models work fine for the tournament (ie even in the tournament, they do what they say they are capable of doing) but just need refinement to help increase their capability for tournament-type of games. For instance, I would think a tournament model would have to be based on how teams play against Top 50 or so opponents rather than comparing teams based on how well they would do against the NCAA average opponent. I mean, shouldn't a tournament model attempt to tease apart what separates a #2 team from a #12 team, rather than declaring the game a toss-up? As it is, the model IS correct in that the model is saying, "I can't predict who will win this game," and, sure enough, it does a bad job of predicting such a game, lol.
That's the rub. These aren't really predictions about a team's chances of winning. They are predictions about how well the model can predict who is going to win!!
My severely extreme analogy:
In reality, Duke has a 100% chance of beating East Chapel Hill High School and Kentucky has a 100% chance of beating Jumbo's Allstars (that's our DBR team!)
However, a model (we'll call it 'Mopnek') uses the following criteria to predict winners: there is a 50% likelihood of a team beating an opponent whose name starts with the next letter of the alphabet.
When the games D vs E and K vs J are played out, the winners are (D)uke and (K)entucky.
Mopnek predicted the winner 50% of the time, just as Mopnek said it would!
The fact that the Mopnek prediction was equal to the outcome in the sample is used to validate the system - the system predicts as accurately as it says it predicts.
BUT, that does not mean that a specific team's chances against another team are the same as the chances of the system predicting that game correctly.
The real meaning of that 50% prediction is "In those games, the model has a 50% chance of accurate prediction when choosing its team." It does NOT mean that the team actually has a 50% chance of winning. Put another way, a game predicted by Mopnek as a toss-up does not mean that the game could go either way, it just means that Penkom doesn't know which way the game will go.
Does it matter? Are there cases in which a team actually has a good chance to win a game but the models don't know that (ie declare it a toss-up game)?
The misinterpretations in the crazy analogy probably apply to real world scenarios, too. I agree with Wander in saying that Utah was overrated (because I desperately want to use KenPom to tell me who can beat whom
). In the Dork Polls thread, I tried to complain that KenPom wasn't good at predicting "who is the best team" the way that I view "best team."
http://forums.dukebasketballreport.c...309#post784309
Before Duke's 2nd win over UNC and Utah's loss to Washington, KenPom had Utah ranked #6 and Duke ranked #8. Yet, here were their average unadjusted efficiency margins versus the top KenPom teams (efficiency margin is offensive efficiency minus defensive efficiency...like, do you score more points than your opponent).
|
Avg Per Game Efficiency Margin Against Top Teams in Kenpom Ratings |
|
vs Top 10 |
vs Top 25 |
vs Top 50 |
vs Top 100 |
vs Top 150 |
vs Top 200 |
All Games |
UTAH |
-13.276
|
-11.167
|
1.768 |
9.047 |
15.823 |
18.913 |
25.917 |
DUKE |
13.369 |
15.211 |
11.975 |
15.876 |
16.058 |
15.709 |
22.523 |
I actually held off on posting that data at the time, in part because I feared Utah would prove me wrong (and 'cause the story was more complicated than this chart, with blowouts, recency effects, etc). Well, it turns out that we *played* Utah and beat them. Now, I look at that chart and am certain who I would pick in a
battle between two Top 10 teams!
The rating of Utah (and Texas) made me consider that, while KenPom may do a good job of rating which teams are good according to certain criteria, it might not do the best job at deciding which top teams will beat other top teams. Most the time we don't notice this because
1. Good teams, according to many different criteria, tend to win
2. In games between two good teams, the predictions state that the game could go either way. So, the model looks correct when winning or losing.
Actually, in reality,
the models ARE correct, we are just misinterpreting them. Sure enough, the models aren't good at predicting the games they say they aren't good at predicting (ie they predict the winner only 55% of the time in games where the model believes it has a 55% chance of predicting the winner).
Again, it does not mean that the team actually had a 55% chance of winning. Notably, Duke was 11-2 vs KenPom Top 25 teams (final ratings). Wisconsin was 10-2. Maybe it's just "luck" that KenPom can't predict their wins. Or, maybe, just maybe, there actually
is an uncaptured something about certain teams that make them winners.
Anyway, I'm really,
really sorry for the long post. And, again, I could totally be wrong, but that's how I see the "Chance of winning" topic.