PDA

View Full Version : GFactor: Statistical Predictor for Duke NCAA Tournament Performance



PrinceHal9000
03-30-2018, 10:56 AM
I started doing some Duke hoops statistical analysis during last year's NCAA tournament run to see if I could identify a variable that correlated with Duke's postseason performance.
I found one, but I waited to make it public until this year to see if it actually worked. It did.

The factor is called the GFactor. It is calculated off the average assists per game and assist to turnover ratio of Duke's point guard (the guard with the highest GFactor is considered the point guard).
The formula is 2 * Assists Per Game * (Assists/Turnovers per game).

Duke's GFactor values indicate the below:
>30 = National Championship Team
22-26 = Elite 8 Team
10-17 = Sweet 16/Round of 32 Team
<10 = First Round Exit

This year, Trevon Duval's 22.4 GFactor, predicted an Elite 8 team.

Let's hope that Tre Jones can come close to his brother's 33.01 rating, as that would strongly correlate with a championship-caliber team.

There is really only one significant outlier: 2014's GFactor predicted an Elite 8 quality team, but in reality that team lost to Mercer in the round of 64 (my apologies for bringing up a painful memory).

Results are below, sorted from highest GFactor to lowest:

Season NCAA Results Guard GFactor
2015 Champion Tyus Jones 33.01
2010 Champion Jon Scheyer 30.01
2013 Elite 8 Quinn Cook 25.54
2014 Round of 64 Quinn Cook 24.20
2018 Elite 8 Trevon Duval 22.40
2006 Sweet 16 Greg Paulus 16.46
2011 Sweet 16 Nolan Smith 16.26
2008 Round of 32 Greg Paulus 12.80
2016 Sweet 16 Grayson Allen 12.25
2017 Round of 32 Grayson Allen 11.14
2009 Sweet 16 Jon Scheyer 10.45
2007 Round of 64 Greg Paulus 9.32
2012 Round of 64 Seth Curry 5.76

rsvman
03-30-2018, 11:16 AM
Interesting stuff. Thanks for all the hard work.

I wonder whether assists per made basket would also be predictive in a similar fashion? Or whether there might be other factors that could similarly be generated that would correlate pretty well.......off the top of my head, maybe turnovers versus forced turnovers? or offensive rebounding percentage? or any number of other things.

In any case, good work.

PrinceHal9000
03-30-2018, 11:39 AM
Interesting stuff. Thanks for all the hard work.

I wonder whether assists per made basket would also be predictive in a similar fashion? Or whether there might be other factors that could similarly be generated that would correlate pretty well...off the top of my head, maybe turnovers versus forced turnovers? or offensive rebounding percentage? or any number of other things.

In any case, good work.

I remember looking at a few other factors, but didn't find anything quite as strong as GFactor. I'll keep looking though.

I ran the GFactor analysis on non-Duke national championship teams, and found that correlation wasn't nearly as strong.
That actually makes me more convinced that there is something unique about that factor as it relates to Duke's system vs. other teams' systems.

Below are the GFactors for NCAA Champs:

Season Guard GFactor
2008 Mario Chalmers 19.46
2009 Ty Lawson 45.85
2010 Jon Scheyer 30.01
2011 Kemba Walker 17.61
2012 Marquis Teague 17.07
2013 Peyton Siva 24.07
2014 Shabazz Napier 16.56
2015 Tyus Jones 33.01
2016 Ryan Arcidiacono 25.20
2017 Joel Berry 13.64

Kedsy
03-30-2018, 11:41 AM
I started doing some Duke hoops statistical analysis during last year's NCAA tournament run to see if I could identify a variable that correlated with Duke's postseason performance.
I found one, but I waited to make it public until this year to see if it actually worked. It did.

The factor is called the GFactor. It is calculated off the average assists per game and assist to turnover ratio of Duke's point guard (the guard with the highest GFactor is considered the point guard).
The formula is 2 * Assists Per Game * (Assists/Turnovers per game).

Duke's GFactor values indicate the below:
>30 = National Championship Team
22-26 = Elite 8 Team
10-17 = Sweet 16/Round of 32 Team
<10 = First Round Exit

This year, Trevon Duval's 22.4 GFactor, predicted an Elite 8 team.

Let's hope that Tre Jones can come close to his brother's 33.01 rating, as that would strongly correlate with a championship-caliber team.

There is really only one significant outlier: 2014's GFactor predicted an Elite 8 quality team, but in reality that team lost to Mercer in the round of 64 (my apologies for bringing up a painful memory).

Results are below, sorted from highest GFactor to lowest:

Season NCAA Results Guard GFactor
2015 Champion Tyus Jones 33.01
2010 Champion Jon Scheyer 30.01
2013 Elite 8 Quinn Cook 25.54
2014 Round of 64 Quinn Cook 24.20
2018 Elite 8 Trevon Duval 22.40
2006 Sweet 16 Greg Paulus 16.46
2011 Sweet 16 Nolan Smith 16.26
2008 Round of 32 Greg Paulus 12.80
2016 Sweet 16 Grayson Allen 12.25
2017 Round of 32 Grayson Allen 11.14
2009 Sweet 16 Jon Scheyer 10.45
2007 Round of 64 Greg Paulus 9.32
2012 Round of 64 Seth Curry 5.76

This is interesting, but to me it looks like data fitting (for example, why do you need to multiply by 2?). Also, I assume that between 26 and 30 would mean Final Four? What happens in the gap between 17 and 22? And why do you lump Round of 32 and Sweet 16 together? That seems inconsistent with the rest of your analysis.

Since you stopped going back in 2006, I decided to go back from 2005 to 1985, to see if your formula held up:

2005: 10.8; predicted R32 or S16; actual S16 -- yes
2004: 26.4; predicted F4; actual F4 -- yes
2003: 27.7; predicted F4; actual S16 -- no
2002: 26.7; predicted F4; actual S16 -- no
2001: 25.7; predicted E8; actual champ -- no
2000: 20.6; predicted unknown (either S16 or E8); actual S16 -- not sure
1999: 19.2; predicted unknown (either S16 or E8); actual F2 -- no
1998: 23.6; predicted E8; actual E8 -- yes
1997: 32.2; predicted champ; actual R32 -- no
1996: 16.6; predicted R32/S16; actual R64 -- no
1994: if you call Grant Hill a guard, 17.9 (otherwise 10.2); predicted R32 or S16 (I think); actual F2 -- no
1993: 39.8; predicted champ; actual R32 -- no
1992: 33.1; predicted champ; actual champ -- yes
1991: 28.3; predicted F4; actual champ -- no
1990: 26.4; predicted F4; actual F2 -- yes
1989: 26.8; predicted F4; actual F4 -- yes
1988: 22.6; predicted E8; actual F4 -- no
1987: 11.5; predicted R32 or S16; actual S16 -- yes
1986: 30.1; predicted champ; actual F2 -- no
1985: 33.9; predicted champ; actual R32 -- no

Your formula only worked in 7 of 20 seasons (35%), possibly 8 of 20 (40%). It predicted three teams as champions that lost in the round of 32, and one Sweet 16 team that ended up in the championship game.

I conclude it's an interesting idea but really you're just saying we do better when we have a really good point guard, which is both obvious and also not predictive enough to rely on.

Kedsy
03-30-2018, 11:44 AM
I remember looking at a few other factors, but didn't find anything quite as strong as GFactor. I'll keep looking though.

I ran the GFactor analysis on non-Duke national championship teams, and found that correlation wasn't nearly as strong.
That actually makes me more convinced that there is something unique about that factor as it relates to Duke's system vs. other teams' systems.

More likely it's because you data-fitted it to Duke's 2006 to 2018 performance.

Also, why do you call it the "GFactor"?

Ian
03-30-2018, 11:49 AM
I assume G stands for guardplay?

While I'm sure having good point guard with high A/T ratio predicts better results for the team, I'm highly skeptical tournament outcomes can be reduced to a single factor like this.

CDu
03-30-2018, 11:51 AM
Doesn't work as well with Duke's 2001 (underpredicts, as our best passer had only a 25 GFactor score) or 2004 (overpredicts, as our PG had a 36 GFactor score).

Also, it was a missed bank shot from underpredicting this year's team anyway.

Basically, it seems like a nice coincidence rather than a clearly predictive measure.

duke23
03-30-2018, 11:53 AM
More likely it's because you data-fitted it to Duke's 2006 to 2018 performance.

Also, why do you call it the "GFactor"?

It's not only data-fitted; it makes some strange choices about who our PGs were. 2017 is Grayson instead of Frank Jackson? 2012 is Seth Curry instead of Austin Rivers? 2011 is Nolan rather than Kyrie? (I get that Kyrie only played 8 games, but he came back for the tournament).

uh_no
03-30-2018, 11:54 AM
This is interesting, but to me it looks a little like data fitting
Your formula only worked in 7 of 20 seasons (35%), possibly 8 of 20 (40%). I conclude it's an interesting idea but really you're just saying we do better when we have a really good point guard, which is both obvious and also not predictive enough to rely on.

I figured this would be the case. Thanks for doing the legwork.

Contrary to popular belief, there is a lot of intuition required when doing statistics. One of the big questions is "do the results make intuitive sense?"

KP for instance, it makes sense that scoring one extra point is similar to preventing your opponent from scoring one point. Therefore you can to some degree predict how well a team does by independently looking at how well a team scores points, and how well they prevent the other team from scoring points.

In this case, there's no intuitive reason why this particular stat should be any more predictive than some other stat...say blocked shots. How can any stat which simply ignores defense be valid?

What's really happening here: We have a single set of results (with an entry for each year), and we have a huge number of potential statistical measures. It is almost sure that ONE of them will correlate nicely with the results. But as is cliche, correlation does not equal causation...and just because a stat happened to track in the past does not mean it has any predictive value.

There are several examples of that fact here (one of my favorite sites)

http://www.tylervigen.com/spurious-correlations

Some of them have a confounding variable, but the interesting ones are the ones that happen by "chance" like the miss america and nick cage ones. This stat likely falls into the latter category over the time frame which your analysis is based.

MarkD83
03-30-2018, 12:24 PM
If we truly want a predictive tool we need to start with the factors at play in the system. The outcome that is being predicted is NCAA tournament result which defines the system so the factors to consider should be in that setting. For example the fact that trevon had lots of assists in December games against teams that dont make the NCAA is not a factor in the system. Factors in the system also need to include the influence of the opponent. This is difficult to pull out of just Duke individual or team stats. This all leads me back to kenpom adjusted defensive and offensive stats but perhaps vs only NCAA tourney teams

proelitedota
03-30-2018, 01:21 PM
If we truly want a predictive tool we need to start with the factors at play in the system. The outcome that is being predicted is NCAA tournament result which defines the system so the factors to consider should be in that setting. For example the fact that trevon had lots of assists in December games against teams that dont make the NCAA is not a factor in the system. Factors in the system also need to include the influence of the opponent. This is difficult to pull out of just Duke individual or team stats. This all leads me back to kenpom adjusted defensive and offensive stats but perhaps vs only NCAA tourney teams

Barttorvick has a quality efficiency stats for teams. Our quality efficiency this year against good teams is similar to 2013 and 2011. We don't have many good wins at all this year.

PrinceHal9000
03-30-2018, 01:38 PM
Good feedback.
A few follow up points:

1) It is totally silly to try to predict the outcome of a single-elimination tournament since so much comes down to luck. Nonetheless, it's fun to try.
2) I chose 2006 as the cutoff for onset of the "One and Done" era. Admittedly, it's sort of arbitrary.
3) GFactor is based on the player on the team with the best GFactor score. For example, Frank Jackson's 1.7 AST to 1.4 TO put him way below Grayson last year.
4) The 2xAssist multiplier skews the variable to benefit higher assist player. A 5 AST, 2.5 TO player would have a rating of 20, while a 6 AST, 3 TO player would have a rating of 24. A 2 AST, 1 TO player would be an 8. Again, this is somewhat arbitrary.

NSDukeFan
03-30-2018, 03:34 PM
I figured this would be the case. Thanks for doing the legwork.

Contrary to popular belief, there is a lot of intuition required when doing statistics. One of the big questions is "do the results make intuitive sense?"

KP for instance, it makes sense that scoring one extra point is similar to preventing your opponent from scoring one point. Therefore you can to some degree predict how well a team does by independently looking at how well a team scores points, and how well they prevent the other team from scoring points.

In this case, there's no intuitive reason why this particular stat should be any more predictive than some other stat...say blocked shots. How can any stat which simply ignores defense be valid?

What's really happening here: We have a single set of results (with an entry for each year), and we have a huge number of potential statistical measures. It is almost sure that ONE of them will correlate nicely with the results. But as is cliche, correlation does not equal causation...and just because a stat happened to track in the past does not mean it has any predictive value.

There are several examples of that fact here (one of my favorite sites)

http://www.tylervigen.com/spurious-correlations

Some of them have a confounding variable, but the interesting ones are the ones that happen by "chance" like the miss america and nick cage ones. This stat likely falls into the latter category over the time frame which your analysis is based.

It's fun to go through the correlations and imagine the causations. 😀

cato
03-30-2018, 03:44 PM
Doesn't work as well with Duke's 2001 (underpredicts, as our best passer had only a 25 GFactor score) or 2004 (overpredicts, as our PG had a 36 GFactor score).

Also, it was a missed bank shot from underpredicting this year's team anyway.

Basically, it seems like a nice coincidence rather than a clearly predictive measure.

And/or flipped block/charge call.

CDu
03-30-2018, 03:46 PM
And/or flipped block/charge call.

Well, maybe. No guarantee we win with Carter staying in the game. But guaranteed we win if Allen's shot goes in.

cato
03-30-2018, 05:28 PM
Well, maybe. No guarantee we win with Carter staying in the game. But guaranteed we win if Allen's shot goes in.

Certainly. Hence my hedge. My thought was that in a game as close as the Duke/KU game there can be several pivotal plays, only some of which are under the control of the players on the court.