Summer Fun: Problems in Using RSCI to Predict Performance
It’s the summer doldrums, so Sage Grouse, as usual, is gonna stir the DBR pot. FWIW, my lek was a success back in March and April, and I have successfully carried out my chick-raising duties by totaling ignoring them and the various hens and thereby avoiding the Golden Eagles looking for meals. Moreover (wonder why), there seems to be no interest in a late-season lek.
My subject is RSCI. We have had various folks on DBR who beat other posters about the head and shoulders with RSCI data on recruits. (No, not really, but some of the recipients of comments probably think so.) (RSCI is the composite recruit rankings at the end of the HS career.) No one is saying that “RSCI is destiny,” but some have come close. RSCI is an interesting measure, but not nearly as useful as its advocates maintain.
So, what we have from the RSCI advocates is a one-variable model (RSCI) used to predict performance of rapidly maturing youth, generally two-to-four years later, and as measured by a variety of performance measures – playing time, scoring, rebounds, etc. Respectable social scientists would chortle at the strong advocacy of such a naïve model of complex phenomena. Because I spent a couple of decades pounding data as an “unrespectable” social scientist, I would offer a few observations:
1. There really isn’t very much data to justify strong conclusions. RSCI began with the Corey Maggette class entering in September 1998. Fifty-six freshmen or transfers have enrolled at Duke, but – of course – there is no meaningful career data on the last couple of years. Moreover, there were a bunch of one-and-dones and transfers. Only 32 players since 1998 have entered Duke and played three years. This means that, even in the best of case, we are hardly able to model the vast variety of potential future Duke recruits.
2. Even within the limited data, there are numerous counter-examples, which advocates seem to overlook. Seth Curry became second-team All-ACC and was probably not in the top 200 among HS seniors. It wasn’t that he couldn’t play hoops – he was a skinny kid that no major college would have on the team. Gee, maybe rating people at age 18 isn’t the total answer. Oh, and Miles (#81), Tyler Thornton (NR), Lee Melchionni (NR?), and Dave McClure (#71) all played a lot and had significant impacts on Duke. In fact, three of the above, who were so deficient in RSCI, are in the NBA today. Then, on the other side, Casey (#16) and Josh H. (#32) didn’t play to their ratings.
3. Most researchers would use more than one independent variable in estimating performance, such as, say, position or a time trend to catch changes in the game. It makes no sense to have a single-variable model. Moreover, on the dependent variable side of the equation, there is no reason to believe that the various performance measures (scoring, rebounding, court time) behave the same with respect to any set of independent variables.
4. Don’t get mesmerized by the top performers. Data like “basketball ability” are always skewed to the right, usually through a Normal (Gaussian) or Log-Normal distribution. This means that the difference in ability between player #1 and player #5 is usually a lot bigger than the difference in ability between player #20 and player #25. While one can correct with the use of the proper estimating function, I would probably ignore the top five players – they are "sure things" and putting them in the analysis does not lead to useful insights.
Anyway, just a few observations to kill time in the summer.
Kindly,
Sage
Sage Grouse
---------------------------------------
'When I got on the bus for my first road game at Duke, I saw that every player was carrying textbooks or laptops. I coached in the SEC for 25 years, and I had never seen that before, not even once.' - David Cutcliffe to Duke alumni in Washington, DC, June 2013