PDA

View Full Version : Is the RPI really an inferior selection tool?



Sir Stealth
02-05-2011, 06:22 PM
Al Featherston's article on the main page repeats what seems to be the consensus among college basketball fans: By using the RPI so much, the NCAA selection committee relies on a ranking system that is clearly inferior to other statistical models.

I spend tons of time on kenpom.com and completely agree that it is a more valuable tool for predicting future outcomes than the RPI. That being said, I disagree that the selection committee would be better served by using Pomeroy rankings rather than the RPI. Basketball is still about who has the most points at the end of each game.

Making the NCAA tournament is a reward and an accomplishment in and of itself. Therefore, it seems absolutely proper to use the RPI to determine NCAA selection berths. Who you played and who you beat, no matter how you beat them, are the proper selection criteria. Conversely, while efficiency rankings are great for predicting who will win the in the future, they do not account for wins or losses. We should not reward teams for playing well independent of the final score.

One could argue that efficiency models should be used to properly seed the tournament once berths are determined. I believe that the NCAA already takes injuries into account for seeding the tournament, but not for who makes it. The idea that making the tournament is a reward for a team's accomplishments is also strengthened by the inclusion of minor conference champions who clearly are not among the 64/65/68 most likely to win the championship.

So, can we do away with the increasingly accepted notion that the NCAA is wrong to use the RPI as it's primary selection criteria? Let's agree to appreciate KenPom for it's predictive value but reward teams based only on the number of times they had more points than quality opponents at the end of a game.

uh_no
02-05-2011, 08:57 PM
From what i've gathered on the process, RPI is not as big of a factor as most people (aka the media) would like us to think. the 'eye' test, or 'overall body of work' seem to be the key words for the committee, and while RPI is a fine metric for putting the teams in a preliminary order, the final decision lies with the committee, and they definitely can and do go against RPI when they feel it's correct

Kedsy
02-05-2011, 09:27 PM
So, can we do away with the increasingly accepted notion that the NCAA is wrong to use the RPI as it's primary selection criteria?

No, we can't. Featherston's right. The RPI is clearly inferior. Worse than that, it is easy to "game the system" with the RPI.

Remember the year the MVC got four or five teams into the tournament? All those teams had very high RPI, and the way they did it was to schedule teams with good records from bad conferences. Because the RPI is based solely on winning percentages, these games mostly resulted in Ws for the MVC teams, and because they were playing teams with good records, it really helped their RPI. Meanwhile, teams from power conferences who foolishly played patsies with bad records and pretty tough teams with only OK records (mid-level Big 10 teams, for example), had a poorer RPI when they were clearly superior to the fourth or fifth place team in the MVC.

Coach K has also been a master at gaming the RPI, which might be one reason why we've gotten so many #1 seeds over the years.

Put another way, let's take a quick quiz. According to the RPI, at this moment which team would be more helpful to the "schedule strength" component of the RPI (which makes up 50% of the overall RPI rating): Bethune-Cookman, or Michigan?

The answer, of course, is Bethune-Cookman, whose current record is 13-9 while Michigan's is 13-10. While it's true that Beth-Cook's schedule strength is much worse than Michigan's, that component only makes up 25% of the overall RPI rating. And to be fair, it's also true that playing Michigan will help your overall RPI a little more than Bethune-Cookman, but any rating system that gives a high major any props at all for scheduling Bethune-Cookman is more than a little bit ridiculous.

The RPI is flawed.

hurleyfor3
02-05-2011, 10:31 PM
Here's one flaw with the RPI, at least how I interperet every definition I've seen of it.

Imagine two teams, let's call them Duke and unc. Both teams are undefeated. Unc has played five games; Duke has played six. Five of Duke's six opponents are the same as unc's opponents. Duke's sixth opponent has a W-L and RPI SOS below the average of the other five.

Unc will always be ranked above Duke, solely because the extra team Duke has played draws its average down. More broadly, Duke is punished for having more "trials", even though one could more stongly argue that more wins give supporting evidence that the other wins weren't chance. And this completely ignores venue (home/road/neutral), margin of victory or how recently the games were played.

Or put another way, I'm more impressed with a team that has beaten #2, #3, #201 and #202 than a team that has beaten numbers 99, 100, 101 and 102. Talent in college basketball is nonlinearly distributed (there's always a bigger difference between #1 and #20 than between 201 and 220), but I'm not sure the RPI accounts for this. This, I think, is one way teams can game the system.

hurleyfor3
02-05-2011, 11:12 PM
Coach K has also been a master at gaming the RPI, which might be one reason why we've gotten so many #1 seeds over the years.

Interesting point. We don't schedule any really crappy teams (the Longwoods of the world) that drag our RPI SOS down. An upper-echelon team is almost equally likely to beat #150 as #300, so schedule the 150s instead. Add to this that these games are usually in Cameron or at neutral sites, which RPI doesn't care about but makes a loss even less likely.

Wander
02-06-2011, 03:06 AM
Meanwhile, teams from power conferences who foolishly played patsies with bad records and pretty tough teams with only OK records (mid-level Big 10 teams, for example), had a poorer RPI when they were clearly superior to the fourth or fifth place team in the MVC.


Which mid-level power teams specifically were "clearly superior" to the Missouri Valley teams that year? The MVC was fantastic that year (2006), and every team that got in the tournament was IMO completely deserving, as evidenced somewhat by two of them - Bradley and Wichita State - making the Sweet 16. In fact, you could very seriously argue that the MVC was screwed and should have gotten a 5th team in that year (Missouri State, which I believe still holds the record for the team with the best RPI to not make the tournament).

ice-9
02-06-2011, 05:58 AM
Here's one flaw with the RPI, at least how I interperet every definition I've seen of it.

Imagine two teams, let's call them Duke and unc. Both teams are undefeated. Unc has played five games; Duke has played six. Five of Duke's six opponents are the same as unc's opponents. Duke's sixth opponent has a W-L and RPI SOS below the average of the other five.

Unc will always be ranked above Duke, solely because the extra team Duke has played draws its average down. More broadly, Duke is punished for having more "trials", even though one could more stongly argue that more wins give supporting evidence that the other wins weren't chance. And this completely ignores venue (home/road/neutral), margin of victory or how recently the games were played.

Or put another way, I'm more impressed with a team that has beaten #2, #3, #201 and #202 than a team that has beaten numbers 99, 100, 101 and 102. Talent in college basketball is nonlinearly distributed (there's always a bigger difference between #1 and #20 than between 201 and 220), but I'm not sure the RPI accounts for this. This, I think, is one way teams can game the system.

Is this true? Because in the scenario above Duke would have a higher winning percentage than UNC. Then again, a team's winning percentage is only worth 25% of the formula, so perhaps that fails to compensate for the opponent's lower winning percentage (weighted 50%).

Either way, it is kinda ironic how there seems to be a general consensus that RPI is an inferior metric to measuring the "best" teams deserving of an at large bid, yet it gets the most press and influence (allegedly) on the committee.

To the original poster, it comes down to how you define "best." If by best you mean strictly wins and losses, than a non-margin based ranking works fine. But I think most people will agree that a win "isn't just a win" and that taking margin into account matters in determining which teams are truly "best."

For example, the RPI has Harvard at 50 and USC at 84. KenPom has the two teams flipped, with USC at 50 and Harvard at 87. Who would you put your money on? Going by the RPI formula, it should be Harvard (best wins: Colorado, BC), but I think most of us would bet on USC (best wins: Texas, Washington St., UCLA; lost to Kansas at Kansas by 2).

bob blue devil
02-06-2011, 07:44 AM
the RPI is an affront to statistics and an embarrassment to the NCAA - simply put it defines your strength of schedule by the record of your opponent (which is obviously skewed by who they play) and not by their ranking in the model. if the RPI really thought a team, say Georgetown, was the 5th best team in the country, but Georgetown had a less strong record, 18-5 = ~35th best in the country, b/c they played a tough strength of schedule, hardest in the country, why would you treat other teams who play Georgetown as if they had played the 35th best team in the country? particularly when you're model is supposedly saying they are the 5th best team.

to the earlier point of K gaming the RPI - I've admired it for years and wondered if it is intentional. no better way to boost your RPI than by playing the best teams in the weaker conferences under the guise of 'this is representative of an early round ncaa tournament team.'

jeff sagarin's 'elo chess' is superior to RPI and does NOT include margin of victory, so it should satisfy at least that part of your concern. i say superior not meaning i find today's rankings more or less accurate, rather the process of creating the ranks is more statistically sound/intuitive.

i agree incorporating margin of victory is a bit distasteful as it changes a team's motivation from simply winning (which is supposed to be the goal of a game) to blowing others out and minimizing margins of defeat. however, for accuracy of prediction, there is no way to get around incorporating margin of victory - i.e. pomeroy's ratings and sagarin's 'predictor' ratings are going to do a far better job of ranking team's than models like the RPI and sagarin's 'elo chess'.

CDu
02-06-2011, 08:34 AM
Is this true? Because in the scenario above Duke would have a higher winning percentage than UNC. Then again, a team's winning percentage is only worth 25% of the formula, so perhaps that fails to compensate for the opponent's lower winning percentage (weighted 50%).

In both scenarios, the team is undefeated (6-0 and 5-0). So the winning percentages are the same.

ice-9
02-06-2011, 09:28 AM
In both scenarios, the team is undefeated (6-0 and 5-0). So the winning percentages are the same.

Did I really graduate from Duke? LOL!

But then this won't be a real life problem since we can expect no team to go undefeated during the season.

mehmattski
02-06-2011, 09:56 AM
i agree incorporating margin of victory is a bit distasteful as it changes a team's motivation from simply winning (which is supposed to be the goal of a game) to blowing others out and minimizing margins of defeat. however, for accuracy of prediction, there is no way to get around incorporating margin of victory - i.e. pomeroy's ratings and sagarin's 'predictor' ratings are going to do a far better job of ranking team's than models like the RPI and sagarin's 'elo chess'.

QFT. It should be noted that in Pomeroy's system, it's more than simple "margin of victory," because in order to have a better efficiency margin (and therefore higher kenpom rating), a team would have to score more points per possession. Simply beating Longwood by 80 instead of 50 would have no effect on the kenpom rating, because the team would need 20 more possessions to score those points.

Pomeroy made a point on twitter the other day, when media types were talking about how "it's more than RPI." He said, essentially: "If RPI doesn't matter, then why can Bracketologists basically nail the field every year using RPI alone?"

bob blue devil
02-06-2011, 10:13 AM
QFT. It should be noted that in Pomeroy's system, it's more than simple "margin of victory," because in order to have a better efficiency margin (and therefore higher kenpom rating), a team would have to score more points per possession. Simply beating Longwood by 80 instead of 50 would have no effect on the kenpom rating, because the team would need 20 more possessions to score those points.

Pomeroy made a point on twitter the other day, when media types were talking about how "it's more than RPI." He said, essentially: "If RPI doesn't matter, then why can Bracketologists basically nail the field every year using RPI alone?"

good point - i guess more accurately speaking to pomeroy's system (which is my favorite) i should've referred to 'scaled margin of victory' instead of absolute margin of victory.

i guess a better way to make my point focusing on pomeroy's system is that if you are up by 30 with 7 minutes to go (like we were yesterday), your optimal winning strategy on offense probably is to run down the shot clock and see what you can do in the last 10 seconds of it (reducing number of positions and further limiting the long odds NC State faced) rather than take the first high efficiency shot that comes your way. this strategy, while increasing your odds of winning reduces your pomeroy rating. and in this example optimal winning strategy and not running up the score are in harmony, which is a nice additional benefit.

Kedsy
02-06-2011, 11:23 AM
Pomeroy made a point on twitter the other day, when media types were talking about how "it's more than RPI." He said, essentially: "If RPI doesn't matter, then why can Bracketologists basically nail the field every year using RPI alone?"

Pomeroy wasn't saying the RPI is a decent metric, was he? I can't believe he meant that. Or was he downplaying the idea that the committee is supposedly using more than the RPI?

Bracketologists are predicting what the committee will do. If the committee primarily uses the RPI, then to predict their behavior you'd have to focus on the RPI. So is that what Pomeroy was saying?

Kedsy
02-06-2011, 11:35 AM
Which mid-level power teams specifically were "clearly superior" to the Missouri Valley teams that year? The MVC was fantastic that year (2006), and every team that got in the tournament was IMO completely deserving, as evidenced somewhat by two of them - Bradley and Wichita State - making the Sweet 16. In fact, you could very seriously argue that the MVC was screwed and should have gotten a 5th team in that year (Missouri State, which I believe still holds the record for the team with the best RPI to not make the tournament).

This strikes me as a circular argument. If Missouri State gamed the RPI, then why would it matter if they were the team with the best RPI not to make the tournament?

It seems to me another circular argument is justifying selection or seeding based on tournament performance. If a team gets a higher seed than they should, they have a higher chance of winning more games.

darthur
02-06-2011, 11:41 AM
RPI is a downright bad measure. It's simple... but it just doesn't work.

TWENTY-FIVE percent of your score is determined by how you do on the court. That's it. A team that goes 0-30 against good competition will have a vastly superior RPI to a team that goes 30-0 against bad competition. As people said earlier, why does Duke schedule good mid-major teams every year? One reason is it greatly improves our RPI in a way that playing good major-conference teams cannot. We should not have to choose our schedule in order to try to trick a rating system.

Now if you want to rule out margin of victory calculations, that makes sense. But look at something like Sagarin's ELO ratings. That pure win/loss, and it is based on a sensible formula, instead of the garbage that is RPI.

Wander
02-06-2011, 12:17 PM
If a team gets a higher seed than they should, they have a higher chance of winning more games.

Again, which team got a higher seed than they should have? Are you really prepared to argue that a team that was probably the single last at-large bid of the tournament, got a 13 seed, and finished with a Pomeroy rating of 26th, was overseeded? Or that the champions of one of the best conferences was overseeded with an 11 seed? I just really don't see any credible argument here - the RPI, kenpom numbers*, and the eye test all agree that the Missouri Valley should have gotten 4 - 6 teams in the tournament that year. It's not just from one metric.

I know this is all tangential, but this subject has a special place in my heart, as Wichita State and Bradley made me completely own everyone in the first weekend of all my bracket pools that year. :)

(*full disclosure, these kenpom numbers include postseason games - but except for Creighton at 59, which I think was fairly put in the NIT, they're all far enough away from the border that we can safely say they were still in at-large territory before the tournament)

DukieTiger
02-06-2011, 12:22 PM
Just for reference, here are the Pomeroy ratings of the 2006 MVC.

http://kenpom.com/conf.php?y=2006&c=MVC

Looks like the MVC teams gamed KenPom's system as well

DukieTiger
02-06-2011, 12:33 PM
Pomeroy wasn't saying the RPI is a decent metric, was he? I can't believe he meant that. Or was he downplaying the idea that the committee is supposedly using more than the RPI?

Bracketologists are predicting what the committee will do. If the committee primarily uses the RPI, then to predict their behavior you'd have to focus on the RPI. So is that what Pomeroy was saying?

Pom said what he said in response to some on the selection committee thumping their chests about how many games they must watch to get a good feel for all teams, etc. Pomeroy's point was certainly that the committee leans on the RPI because a sufficient eye test of all teams is obviously subjective but also likely impossible for mere humans over a 5 month season.

Sir Stealth
02-06-2011, 01:00 PM
I agree with criticisms of the RPI's flaws and probably undersold that to open up the discussion topic. My main point was that to me, when evaluating who should make the Tournament, it's not how you play the game, it's whether you win or lose. If you just go by in-game efficiency and leave out who was ahead at the buzzer, then a last second shot in a game decided by one does not make a very big difference in a team's ranking. I know that it shouldn't make a big difference in predicting the next outcome, but for evaluating a team's accomplishments, that shot should make a huge difference.

I don't think that sleight of hand with scheduling should outweigh beating quality opponents either. I agree that the RPI is flawed and don't like the extent that a team can "game" it, as others have pointed out. That being said, the ability to game the system by scheduling soft but not terrible teams is really something that applies more to truly top teams like Duke. A team like Duke might still have a near zero chance of losing to a middle of the pack college basketball team, but for a team on the margin of the NCAA tournament, that game truly is much more loseable than scheduling a team at the very bottom. Home/away should really be factored in though.

Kedsy
02-06-2011, 01:11 PM
(*full disclosure, these kenpom numbers include postseason games - but except for Creighton at 59, which I think was fairly put in the NIT, they're all far enough away from the border that we can safely say they were still in at-large territory before the tournament)


Looks like the MVC teams gamed KenPom's system as well

Including the post-season numbers makes looking at Pomeroy in retrospect a lot less valuable, because post-season success can really change his ratings a lot. Case in point, in 2006, South Carolina is listed in Pomeroy as #15 in the country (after winning the NIT), much higher than any MVC team.

If you can show me the MVC teams all had great Pomeroy ratings before the tournament, then I'll take it all back.

Incidentally, I am not arguing the MVC teams that year were all terrible teams. I thought at the time the league should get at least three bids. I'm saying they gamed the system to make it look like they were better than they were for the purposes of NCAA selection. You don't deny that, do you? I can't offer a link, but I read several interviews at the time in which MVC coaches admitted to the practice.


EDIT: Another example of the deceptiveness of post-tourney Pomeroy ratings: last season, after winning their conference tournament, Butler was ranked #26 in Pomeroy. After the NCAAT, they were ranked #12. Xavier moved from #22 to #14 by making the Sweet 16. NIT champion Dayton moved from #45 to #26. Drop those 2006 MVC teams 8 or 10 spots and some of them start looking pretty borderline.

gw67
02-07-2011, 02:46 PM
I agree with Kedsey that the MVC has gamed the RPI for awhile. To some extent, the Devils do as well. They play in tough pre-ACC tourneys but they generally schedule good OOC teams at either CIS or a friendly venue like MSG. They also typically play a bunch of teams in the 80 to 175 range as rated by RPI and again play them at CIS. This is not a putdown. I admire the basketball staff for following this course. The only time it backfires is when the "good" teams you scheduled don't turn out to be so good.

I prefer Sagarin over RPI because it includes won-loss, margin, and home/away. At about this time of year, I also begin to look at Massey's ratings comparison. His table compares several rankings including RPI, Sagarin, Pomeroy, as well as the two polls and a host of others and ranks them according to the calculated mean. As of Sunday he has Duke (6), UNC (16), FSU (40), VT (42), Md (44), Clemson (47) and BC (60). This is about right in my opinion.

http://www.masseyratings.com/cb/compare.htm

gw67

rasputin
02-07-2011, 04:00 PM
Which mid-level power teams specifically were "clearly superior" to the Missouri Valley teams that year? The MVC was fantastic that year (2006), and every team that got in the tournament was IMO completely deserving, as evidenced somewhat by two of them - Bradley and Wichita State - making the Sweet 16. In fact, you could very seriously argue that the MVC was screwed and should have gotten a 5th team in that year (Missouri State, which I believe still holds the record for the team with the best RPI to not make the tournament).

2006 was the year of the famous Billy Packer mid-major whining episode, was it not?

I got to see Wichita State's performances that year both in the MVC tournament (called Arch Madness here in St. Louis) and in the NCAA tournament in Greensboro where, IIRC, they took out #2 seed Tennessee. They were a really good basketball team, by any metric you pick.

Reilly
02-07-2011, 04:03 PM
2006 was the year of the famous Billy Packer mid-major whining episode, was it not? .....

I think you could pick any year and be correct. In 2004, he was saying St. Joe's in no way deserved to be #1 (the same St. Joe's that barely lost in the Elite 8) ... when Phil Martelli said at a prep rally that "Billy Packer can kiss my .....