Summer Fun: Problems in Using RSCI to Predict Performance

**sagegrouse** · 07-25-2015, 06:24 PM

It’s the summer doldrums, so Sage Grouse, as usual, is gonna stir the DBR pot. FWIW, my lek was a success back in March and April, and I have successfully carried out my chick-raising duties by totaling ignoring them and the various hens and thereby avoiding the Golden Eagles looking for meals. Moreover (wonder why), there seems to be no interest in a late-season lek.

My subject is RSCI. We have had various folks on DBR who beat other posters about the head and shoulders with RSCI data on recruits. (No, not really, but some of the recipients of comments probably think so.) (RSCI is the composite recruit rankings at the end of the HS career.) No one is saying that “RSCI is destiny,” but some have come close. RSCI is an interesting measure, but not nearly as useful as its advocates maintain.

So, what we have from the RSCI advocates is a one-variable model (RSCI) used to predict performance of rapidly maturing youth, generally two-to-four years later, and as measured by a variety of performance measures – playing time, scoring, rebounds, etc. Respectable social scientists would chortle at the strong advocacy of such a naïve model of complex phenomena. Because I spent a couple of decades pounding data as an “unrespectable” social scientist, I would offer a few observations:

1. There really isn’t very much data to justify strong conclusions. RSCI began with the Corey Maggette class entering in September 1998. Fifty-six freshmen or transfers have enrolled at Duke, but – of course – there is no meaningful career data on the last couple of years. Moreover, there were a bunch of one-and-dones and transfers. Only 32 players since 1998 have entered Duke and played three years. This means that, even in the best of case, we are hardly able to model the vast variety of potential future Duke recruits.

2. Even within the limited data, there are numerous counter-examples, which advocates seem to overlook. Seth Curry became second-team All-ACC and was probably not in the top 200 among HS seniors. It wasn’t that he couldn’t play hoops – he was a skinny kid that no major college would have on the team. Gee, maybe rating people at age 18 isn’t the total answer. Oh, and Miles (#81), Tyler Thornton (NR), Lee Melchionni (NR?), and Dave McClure (#71) all played a lot and had significant impacts on Duke. In fact, three of the above, who were so deficient in RSCI, are in the NBA today. Then, on the other side, Casey (#16) and Josh H. (#32) didn’t play to their ratings.

3. Most researchers would use more than one independent variable in estimating performance, such as, say, position or a time trend to catch changes in the game. It makes no sense to have a single-variable model. Moreover, on the dependent variable side of the equation, there is no reason to believe that the various performance measures (scoring, rebounding, court time) behave the same with respect to any set of independent variables.

4. Don’t get mesmerized by the top performers. Data like “basketball ability” are always skewed to the right, usually through a Normal (Gaussian) or Log-Normal distribution. This means that the difference in ability between player #1 and player #5 is usually a lot bigger than the difference in ability between player #20 and player #25. While one can correct with the use of the proper estimating function, I would probably ignore the top five players – they are "sure things" and putting them in the analysis does not lead to useful insights.

Anyway, just a few observations to kill time in the summer.

Kindly,
Sage

**bob blue devil** · 07-25-2015, 06:47 PM

fun topic. thanks!

my naive interpretation of your point is that prediction models come with standard errors and you are concerned that people are underestimating this point, particularly given the complex nature of the data being used and the conclusions being reached? yeah, i agree, vrank the tank is a long shot to be a big contributor, but i'd give him at least a 10% chance. oh, wait, you were talking generally... sorry!

**MarkD83** · 07-25-2015, 07:23 PM

Thanks you Sage for an analysis of the faults in the RSCI rankings. This reminds me of the cautionary tale I heard of doing social science studies (or any studies with too few data points).

A researcher reported the following results: 33% of the respondents agreed with the issue; 33% of the respondents disagreed with the issue and the last guy refused to answer.

(By the way this applies to the 247 recruiting predictions some times.)

**Kedsy** · 07-25-2015, 09:08 PM

Originally Posted by sagegrouse

So, what we have from the RSCI advocates is a one-variable model (RSCI) used to predict performance of rapidly maturing youth, generally two-to-four years later, and as measured by a variety of performance measures – playing time, scoring, rebounds, etc.

I guess you could call it a "one variable model," but it depends on how you look at it. Seems to me, it's a summary of many variables considered by many observers. It's just that the many variables have been summarized into one number per player.

Also, if you insist on looking at it as a one-variable model, that variable could probably be fairly described as "how good the player is," which to me sounds like it should be a pretty good predictor of player performance.

Finally, unless someone has come up with a better, more predictive model, I'd say we might as well use the best one we have.

**duke09hms** · 07-25-2015, 11:12 PM

When it comes to a single best tool for evaluating talent, RSCI is it. But just like a meta-analysis is only as good as its individual studies, RSCI is the same way, and I'd expect that its predictive power greatly weakens when there are problematic inputs for individual recruits. If the typical recruit in the RSCI class is a 4-year HS player who has played in the USA and AAU their whole life, I would be very surprised if their collegiate career was not accurately modeled by RSCI. Ex. Tyler Thornton, though his heavy court time was directly a result of extenuating circumstance: lack of PG options until Quinn's later years, and he was hardly that productive as a player.

However, for recruits with limited data, such as those who grew up internationally and/or didn't play much AAU and didn't get the exposure opportunity, I'd expect their collegiate confidence interval widens considerably. I think I read somewhere that both Curry brothers largely eschewed AAU play, which is why they flew so under-the-radar, meaning RSCI is and was a poor predictor for them.

-Vrankovic may be similar since he only came to Florida for the last 2 years of HS, and where he is now in RSCI may not follow the trend of his neighboring RSCI peers. Hopefully, he's on a steeper upward trajectory than RSCI projects for a 3-star recruit, and given the limited time he's been playing HS/AAU, he has a very wide confidence interval for his Duke contribution. Plus, he's probably Duke 2020, so we have a lot of time value for him to strike.

-Less extreme examples of this may be players who reclassify to enter college earlier. Since most scouting/scrutiny starts ramping up sophomore-senior year, there would be only 2 years of data to analyze for reclassifiers, and their RSCI rating may be less predictive than if they had graduated when expected.

-Justin Robinson, on the other hand, has been in American basketball and playing HS/AAU, so I'd expect his collegiate productivity to track alongside his RSCI trend - likely very little game time at Duke.

What makes basketball recruiting even harder to predict is the small sample size nature being so much more susceptible to one-off events (injuries, transfer, tragedies). Duke on average only brings in 3-4 players a year, and so a Duke commit's gametime can largely be explained by how successful we are recruiting the years before and after him and perhaps not so much his intrinsic RSCI rating. Given how we've been killing it on the recruiting trail recently, I think the RSCI bar is even higher now for a Duke player to get significant game time. But again, it'd be VERY difficult to prove because that time period has been so short and troubled by small sample size issues.

P.S. I don't know why we're talking about Vrankovic so much though. That conversation could honestly wait at least 2 maybe 3 years. I'm much more curious about the Derryck Thornton PG question. Love that he came in with a defensive ballhawk reputation, but will he have enough floor generalship ability to run our offense? I have no idea.

**sagegrouse** · 07-26-2015, 12:20 AM

Originally Posted by Kedsy

I guess you could call it a "one variable model," but it depends on how you look at it. Seems to me, it's a summary of many variables considered by many observers. It's just that the many variables have been summarized into one number per player.

Also, if you insist on looking at it as a one-variable model, that variable could probably be fairly described as "how good the player is," which to me sounds like it should be a pretty good predictor of player performance.

Finally, unless someone has come up with a better, more predictive model, I'd say we might as well use the best one we have.

Keds: Even though RSCI is the compilation of many opinions, it's still a scalar value. Most studies seeking to predict of explain behavior or performance use many variables (vectors), such as through multivariate regression analysis. It may be the "best one we have," but there is no reason to use just one.

And thanks for not pointing out my error: my list of over-performers included two NBA players, not three.

Sage

**Olympic Fan** · 07-26-2015, 12:57 AM

I see RSC much like Pomery's "predictions"-- which he frames as probabilities: Duke playing Elon in Durham gives Duke a 99 percent victory ... Duke playing N. 2 Wisconsin at Wisconsin gives Duke a 30 percent victory probability. That doesn't mean Duke can't win, but his numbers are usually fairly reasonable.

When I look at the RSCI I see similar probabilities.

A top 5 prospect has probably a 90-95 percent chance of being an impact player... maybe a 75 percent chance of being a impact guy as a freshman.

The probabilities go down, the farther down the list you go. A kid who is not in the top 100 may have something like a 5 percent chance to be an impact player and one-percent chance of doing it as a freshman.

Pomeroy knows that there are enough variables to make precise predictions. The RSCI has plenty of variables too -- some kids mature later .. foreign kids gum up the rankings ... kids who reclassify mess things up -- just look at how screwed up Jamal Murray's ranking is -- he's a Canadian who reclassified at the last minute. Overall, he shows up at No 71 in the rsci ..., the two services that rank him have him at No. 10 and No. 12, but four of the other rsci services don't have him rand (for the record, I think he' a top 5 guy maybe No. 1 in the class)

I think it's a good rough predictor of performance, but it you think every top 10 guy is going to be a star and no sub-100 guys are going to be stars, the you have a lot more respect for he recruiting gurus than I do(remember Clark Francis and Ivan Renko?). Others have pointed out, stars such as Stephen Curry, Seth Curry and Frank Kaminsky were not top 100 rsci guys. Go back before the rsci and guys lik Tim Duncan and Tom Gulgiotta were not lose to top 100 guys coming out of high school.

**itshoopsbabee** · 07-26-2015, 01:04 AM

Originally Posted by bob blue devil

fun topic. thanks!

my naive interpretation of your point is that prediction models come with standard errors and you are concerned that people are underestimating this point, particularly given the complex nature of the data being used and the conclusions being reached? yeah, i agree, vrank the tank is a long shot to be a big contributor, but i'd give him at least a 10% chance. oh, wait, you were talking generally... sorry!

Is that for his career or just his semior season?

**Des Esseintes** · 07-26-2015, 02:35 AM

Originally Posted by Kedsy

Finally, unless someone has come up with a better, more predictive model, I'd say we might as well use the best one we have.

Nah, let's just assume all the exceptions are the rule. That's WAY more rigorous and scientific and stuff.

**ice-9** · 07-26-2015, 04:20 AM

Originally Posted by duke09hms

P.S. I don't know why we're talking about Vrankovic so much though. That conversation could honestly wait at least 2 maybe 3 years. I'm much more curious about the Derryck Thornton PG question. Love that he came in with a defensive ballhawk reputation, but will he have enough floor generalship ability to run our offense? I have no idea.

At the risk of throwing this off-topic -- agreed that Thornton's floor leadership is critical, and a question mark. One mitigating factor is that he won't be our only potential ballhandler. As the coaching staff have said, Grayson and Kennard also have the ability to handle. Several possibilities:

1. Thornton is just as good overall as Tyus. We don't need anyone else aside from when Thornton is on the bench.
2. Thornton isn't ready at all, and fortunately someone else steps up.
3. Thornton isn't ready at all, but neither is Grayson or Kennard. Floor leadership becomes the team's flaw.
4. Thornton starts slow but matures by the end of the season.

My prediction? The last scenario. Thornton will struggle in the beginning and have inconsistent performances. There'll be games when Grayson will do PG duty. But by the end of the season, Thornton will have matured and floor leadership, while maybe not a strength, won't be a liability. He will be our primary ballhandler and our wings can focus on slicing and dicing the opposition.

The next most likely scenario, in my entirely speculative estimation, is the second, where even if he's not ready, someone else will step up.

Through it all, I expect our defense to carry us and be this team's hallmark.

Even if the third scenario came to pass, we'll still be a decent team.

And if we get the first scenario... oh boy!

**BD80** · 07-26-2015, 08:28 AM

Originally Posted by ice-9

At the risk of throwing this off-topic -- agreed that Thornton's floor leadership is critical, and a question mark. One mitigating factor is that he won't be our only potential ballhandler. As the coaching staff have said, Grayson and Kennard also have the ability to handle. Several possibilities:

...

Ingram handling the ball, ala Grant Hill.

Whether playing the 3 or 4 (with 2 or 3 perimeter players), he can attack the opposing defense and dish for open shots.

**CDu** · 07-26-2015, 08:55 AM

Originally Posted by Des Esseintes

Nah, let's just assume all the exceptions are the rule. That's WAY more rigorous and scientific and stuff.

There is a difference between saying an exception is possible and saying the exception is the rule. To my knowledge, only one side of this discussion has been speaking in absolutes about players' chances.

I don't expect Vrankovic to ever be an impact player here. But I certainly won't say he has no chance to do so.

**Kedsy** · 07-26-2015, 09:23 AM

Originally Posted by ice-9

Through it all, I expect our defense to carry us and be this team's hallmark.

At the risk of throwing this even further off topic, I entirely agree with this. Derryck has the reputation of being a strong defender, which is why I'm hopeful your 2nd or 3rd scenarios won't come to pass.

In a lame attempt to move this closer to on topic, I will add that I sometimes think the recruiting gurus might shortchange defense when they make their evaluations. This could potentially explain why guys like Tyler Thornton and Dave McClure had such low rankings but were able to contribute more than expected. I think this phenomenon probably happens less frequently with big men than with perimeter players, though, which means it probably wouldn't affect Vrankovic too much.

**BD80** · 07-26-2015, 11:06 AM

Originally Posted by Kedsy

...

Finally, unless someone has come up with a better, more predictive model, I'd say we might as well use the best one we have.

Pedigree my good man, pedigree.

**bob blue devil** · 07-26-2015, 11:22 AM

Originally Posted by sagegrouse

Keds: Even though RSCI is the compilation of many opinions, it's still a scalar value. Most studies seeking to predict of explain behavior or performance use many variables (vectors), such as through multivariate regression analysis. It may be the "best one we have," but there is no reason to use just one.

And thanks for not pointing out my error: my list of over-performers included two NBA players, not three.

Sage

i agree, rsci is imperfect - in fact, i think it is seriously flawed in its construction and we are better off just looking at the rankings of the 1-2 best recruiting services than using a metric that treats all rankings equally.

i'm not fully on board with the criticism of univariate models. fwiw, in the real world people actually use univariate models all the time and to quite good effect. different data, different purposes, different tolerance for mistakes, so on and so forth - not everything needs to be a "study". using rsci to predict basketball performance is a reasonable idea. yes, using more variables and data would likely yield a more precise model, but that doesn't nullify the ability of a single variate model to provide some useful insight.

**CDu** · 07-26-2015, 11:50 AM

Originally Posted by Kedsy

I guess you could call it a "one variable model," but it depends on how you look at it. Seems to me, it's a summary of many variables considered by many observers. It's just that the many variables have been summarized into one number per player.

Also, if you insist on looking at it as a one-variable model, that variable could probably be fairly described as "how good the player is," which to me sounds like it should be a pretty good predictor of player performance.

Finally, unless someone has come up with a better, more predictive model, I'd say we might as well use the best one we have.

It is definitely a one variable model. The only input is RSCI ranking. That ranking is made up of lots of factors, but it is still a single variable. Just like a model of "tourney champion" is a single variable that happens to be made up of lots of games.

If you break up the RSCI into its various components, it becomes a multivariable model (I think 4 recruiting services are included, so four variable model, or whatever number of services it is). But using RSCI on its own makes it a one-variable model.

I do agree that RSCI is probably the best predictor we have, and is a pretty good one too. But I think Sage's point is that RSCI is far from infallible, and as such there is a great deal of uncertainty around the predicted outcome with RSCI rank, such that making definitive statements is bad practice.

I think it's very fair (in fact, appropriate) to say that a player outside the RSCI top-100 is highly unlikely to make an impact at Duke. I would not rule it out altogether though.

**rocketeli** · 07-26-2015, 12:20 PM

I think it is a bit disingenuous to frame an argument against using the RSCI by restricting it to Duke players. A better evaluation would be to look at its predictive power for all recruits, as all (or all seriously considered HS recruits) are "enrolled in the study" so to speak and receive ratings. I'm sure someone has looked at the national data?

**sagegrouse** · 07-26-2015, 12:55 PM

Originally Posted by rocketeli

I think it is a bit disingenuous to frame an argument against using the RSCI by restricting it to Duke players. A better evaluation would be to look at its predictive power for all recruits, as all (or all seriously considered HS recruits) are "enrolled in the study" so to speak and receive ratings. I'm sure someone has looked at the national data?

It may be wrong, but it is not "disingenuous." I think each program has some unique characteristics, including competition for playing time (which does differ from year to year). Would you think the experience of RSCI #70 players would be the same at Duke, Wake, State, and Clemson? Also, K values experience -- because of his commitment to defense -- and tends to give playing time to veterans of the program. (Yep. Last year was an outlier. We'll see about the future.)

Also, you're using RSCI to predict exactly what? Playing time, points, rebounds? How are you going to estimate that? The casual arguments presented to date seem to have been of the form, "No player at Duke with an RSCI above XX has ever done YY points/rebounds/minutes per game." Which is OK but some posters have used such casual statistics as an "iron rule" of Duke basketball. Not surprisingly, some of us are unimpressed by these assertions when there are only three dozen or fewer players in the population. And there's a lot more...

**Listen to Quants** · 07-26-2015, 01:15 PM

Originally Posted by Kedsy

I guess you could call it a "one variable model," but it depends on how you look at it. Seems to me, it's a summary of many variables considered by many observers. It's just that the many variables have been summarized into one number per player.

Also, if you insist on looking at it as a one-variable model, that variable could probably be fairly described as "how good the player is," which to me sounds like it should be a pretty good predictor of player performance.

Finally, unless someone has come up with a better, more predictive model, I'd say we might as well use the best one we have.

Originally Posted by sagegrouse

Keds: Even though RSCI is the compilation of many opinions, it's still a scalar value. Most studies seeking to predict of explain behavior or performance use many variables (vectors), such as through multivariate regression analysis. It may be the "best one we have," but there is no reason to use just one.

And thanks for not pointing out my error: my list of over-performers included two NBA players, not three.

Sage

The RSCI is of course a complication of human opinions. Those opinions can be viewed as multivariable nonlinear regressions. Highly refined neural nets regressions. The output of those regressions is a scalar but it is a multivariate input (the 'multi' is the multiple variables each expert weights, not the many experts).

**Des Esseintes** · 07-26-2015, 01:19 PM

Originally Posted by sagegrouse

It may be wrong, but it is not "disingenuous." I think each program has some unique characteristics, including competition for playing time (which does differ from year to year). Would you think the experience of RSCI #70 players would be the same at Duke, Wake, State, and Clemson? Also, K values experience -- because of his commitment to defense -- and tends to give playing time to veterans of the program. (Yep. Last year was an outlier. We'll see about the future.)

Also, you're using RSCI to predict exactly what? Playing time, points, rebounds? How are you going to estimate that? The casual arguments presented to date seem to have been of the form, "No player at Duke with an RSCI above XX has ever done YY points/rebounds/minutes per game." Which is OK but some posters have used such casual statistics as an "iron rule" of Duke basketball. Not surprisingly, some of us are unimpressed by these assertions when there are only three dozen or fewer players in the population. And there's a lot more...

Sage, do you remember last season when you were arguing to anyone who would listen that K was going to expand his starting rotation based on no evidence whatsoever? The past is not an "iron rule." It does, however, have a superior predictive track record to making things up.

Thread: Summer Fun: Problems in Using RSCI to Predict Performance

Thread Tools

Display

Summer Fun: Problems in Using RSCI to Predict Performance

Similar Threads

RSCI rankings

Posting Permissions