PDA

View Full Version : When the Software Is the Sportswriter



Grey Devil
11-28-2010, 03:23 PM
The New York Times has a story in its Business section today about a company (StatSheet) located in Durham that has created a method “to endow software with the ability to turn game statistics into articles about college basketball games.” I don’t know if any of you have seen the article yet (http://www.nytimes.com/2010/11/28/business/28digi.html?_r=1&ref=technology), but you might want to check it out.

Once you read the NY Times story you might also want to read the company’s current (computer generated) summary of the Duke-Oregon game at http://bluedevildaily.com/, as well as their statistical comparisons between Duke and any of their upcoming opponents (from Michigan State on…), as well as explore the site more fully.

The NYT article triggered some interesting thoughts for me.

First, I wondered if the company was created, employed, or led by Duke alums. Seems logical to me given their location and interest in sports. Anybody on this board know?

Second, even though I’ve been involved with the technology industry for over 30 years (and have lived and worked in Silicon Valley for over 15 of those years), I can honestly say that their computer generated story about the Oregon game does not compare in quality to what Jim Sumner posted here on the DBR, nor to any story written by a self-respecting or widely recognized sportswriter that I’ve read -- or even those who aren’t self-respecting or who are only narrowly recognized. ;-) However, that said, I have been in the tech industry long enough to know that one should never say “that can’t be done with technology.” So I fear for our friends and colleagues in the sports writing business -- Jim Sumner, Barry Jacobs, and Al Featherston and others (especially Barry since his specialty seems to be interesting interpretations of stats from the sport) -- when I read stories such as this one. I fear that it might be another example of the disintermediating effect of the tech industry I’ve worked in for so long.

What are your thoughts and reactions to this? (I would especially appreciate hearing from Jim, Barry, or Al with their thoughts, but I can also understand why they might be reticent to contribute such thoughts on a public board.)

Grey Devil

P.S. Moderators, if you think this discussion is more appropriately placed in the Off-Topic board, feel free to move it there.

uh_no
11-28-2010, 03:38 PM
The New York Times has a story in its Business section today about a company (StatSheet) located in Durham that has created a method “to endow software with the ability to turn game statistics into articles about college basketball games.” I don’t know if any of you have seen the article yet (http://www.nytimes.com/2010/11/28/business/28digi.html?_r=1&ref=technology), but you might want to check it out.

Once you read the NY Times story you might also want to read the company’s current (computer generated) summary of the Duke-Oregon game at http://bluedevildaily.com/, as well as their statistical comparisons between Duke and any of their upcoming opponents (from Michigan State on…), as well as explore the site more fully.

The NYT article triggered some interesting thoughts for me.

First, I wondered if the company was created, employed, or led by Duke alums. Seems logical to me given their location and interest in sports. Anybody on this board know?

Second, even though I’ve been involved with the technology industry for over 30 years (and have lived and worked in Silicon Valley for over 15 of those years), I can honestly say that their computer generated story about the Oregon game does not compare in quality to what Jim Sumner posted here on the DBR, nor to any story written by a self-respecting or widely recognized sportswriter that I’ve read -- or even those who aren’t self-respecting or who are only narrowly recognized. ;-) However, that said, I have been in the tech industry long enough to know that one should never say “that can’t be done with technology.” So I fear for our friends and colleagues in the sports writing business -- Jim Sumner, Barry Jacobs, and Al Featherston and others (especially Barry since his specialty seems to be interesting interpretations of stats from the sport) -- when I read stories such as this one. I fear that it might be another example of the disintermediating effect of the tech industry I’ve worked in for so long.

What are your thoughts and reactions to this? (I would especially appreciate hearing from Jim, Barry, or Al with their thoughts, but I can also understand why they might be reticent to contribute such thoughts on a public board.)

Grey Devil

P.S. Moderators, if you think this discussion is more appropriately placed in the Off-Topic board, feel free to move it there.

While i think this is good commentary about the very formulaic nature of recap writing, what the software cannot do is attempt to recap a highlight play or give any sort of subjective analysis. All that you will ever see is a description of what can be seen in statistics: duke dominated the boards, duke went on an x-y run from m:ss to m:ss, this is duke's nth win in a row

and then you cna do all sorts of numerical analysis.....but you will never see a description of nolan's alley oop to plumlee

Grey Devil
11-28-2010, 03:48 PM
.....but you will never see a description of nolan's alley oop to plumlee

Guess I didn't make it clear enough in my original post. Based on my over 30 years in the technology field, I've learned never to say "never."

Since that post I've also explored their site a bit more, and found that it's a great resource for all things statistical related to college b-ball, including all kinds of data on individual players shown graphically. Check it out. I think they may be on to something in the statistical side of giving the average fan a statistical method for analyzing team and individual performances.

Grey Devil

gus
11-28-2010, 04:29 PM
Coupled with the nytimes article about the increasing reliance on robots in war, and I begin to fear that some of the dystopian sci fi stories are legitimate fears. But who would have expected the butlerian jihad to start with sports fans??

riverside6
11-28-2010, 04:47 PM
The owner of the statsheet.com is actually a UNC fan.

uh_no
11-28-2010, 04:51 PM
Guess I didn't make it clear enough in my original post. Based on my over 30 years in the technology field, I've learned never to say "never."


feel free to PM me when it someone writes an image processing scheme to determine when a good basketball play as been made (and effectively write about it)

is it possible? of course

is it a primary area of research? no

SmartDevil
11-28-2010, 04:52 PM
"Interesting" in terms of accuracy of software (or more) that this appears in the upper right corner of the UNC home page:

"Season Preview: UNC is very likely to have a better team than last year" (accompanied by a colorful applause-type meter leaning almost all the way to the "better" side"

DukeSean
11-28-2010, 04:55 PM
Is this for people who aren't adept at reading a box score?

SuperTurkey
11-28-2010, 04:56 PM
feel free to PM me when it someone writes an image processing scheme to determine when a good basketball play as been made (and effectively write about it)

is it possible? of course

is it a primary area of research? no

Why would you need to write an algorithm to do that? The computer just needs to aggregate from available sources that a human declared certain plays noteworthy. The computer doesn't have to do everything, soup to nuts.

For example, it's fairly easy to write an algorithm to read the game clock and scores for each play shown in the publicly available highlight videos linked off of ESPN. Then, the program can collate that with the play-by-play game stats and comment on who scored, who assisted, etc.

Jim3k
11-28-2010, 05:08 PM
They need a spelling algorithm to learn the difference between lead and led.


Joevan Catron has lead the team in scoring 5 times and in rebounding 3 times in 6 games this season.


Mason Plumlee has lead the team in rebounding 5 times in 6 games this season.

Lord Ash
11-28-2010, 05:19 PM
I think this is the type of thing that many programmers would love to tackle. I've done my fair share of AI work in the past, and I think it would be WILDLY entertaining to try to tie game-events into some sort of predefined narrative and see if it worked... you know, it checks times between scores and the score of the game and can do something like "If a team makes up 8-12 points of deficit in the last x minutes and come within Y% of the winning score but still lose, then 'Despite a furious comeback the Blah Blahs came up just short, losing to the Blah Blahs 76-72." That sort of thing? I have a sneaking suspicion you could, with some work, do a pretty passable job.

Very interesting stuff. I don't know if computer-written articles are the future, but it is certainly interesting to consider!

Grey Devil
11-28-2010, 07:15 PM
The owner of the statsheet.com is actually a UNC fan.

Yeah, I discovered that later, after I had posted my initial comments and read his blog, discovering in the process that many of the banner images there were of Carolina uniforms and places like the Dean Dome.....ugh!

Grey Devil

RobbieStats
11-28-2010, 07:19 PM
The owner of the statsheet.com is actually a UNC fan.

You going to hold that against me? :-) I keep StatSheet team agnostic. There are no Duke easter eggs anywhere.

BTW, I should mention that I've provided stats to Julian for the Maple Street magazine for the last two years!

RobbieStats
11-28-2010, 07:22 PM
They need a spelling algorithm to learn the difference between lead and led.

That has been fixed. That's the nice thing about algorithms...they can be fixed and won't make the same mistake again.

RobbieStats
11-28-2010, 07:27 PM
While i think this is good commentary about the very formulaic nature of recap writing, what the software cannot do is attempt to recap a highlight play or give any sort of subjective analysis. All that you will ever see is a description of what can be seen in statistics: duke dominated the boards, duke went on an x-y run from m:ss to m:ss, this is duke's nth win in a row

and then you cna do all sorts of numerical analysis.....but you will never see a description of nolan's alley oop to plumlee

People like different things. About half the people I talk to say they look forward to getting just the facts without an "expert" injecting their opinion. The other half say they want the opinion.

The other point I'll make is that we are in the infant stages of this technology. It's going to improve immensely (even over this season) much less a couple of seasons from now. I look forward to hearing everyone's feedback as we progress. If I can please this crowd I can please anyone :-)

Grey Devil
11-28-2010, 07:35 PM
Is this for people who aren't adept at reading a box score?

Actually, check out this page (http://bluedevildaily.com/duke-basketball/compare_stats/michigan-state), which is a detailed set of statistics about a lot of different aspects of the upcoming Michigan State game. Not saying that they are truly predictive of the final result, or that some of the data are on target (like, for example, the chart of "The Four Factors to Winning," which doesn't make it clear why those four factors were chosen over any other four), but it is showing the potential of such a site to allow user exploration of a lot of data (e.g., doing side-by-side comparisons of player data, like Kalin Lucas vs. Kyrie or Nolan, or visually presenting data about what they've called "player impact").

I, at least, find it very interesting and imagine that many others would, also, especially since these kind of data aren't easily available (or easily compared) in any other location I know of.

And remember, it does make it clear on their home page that it is beta.

Grey "geesh, I hate to be promoting a site created by a Tarheel" Devil

(apologies to Jason for using his technique in the sig)

RobbieStats
11-28-2010, 07:47 PM
And remember, it does make it clear on their home page that it is beta.

Grey "geesh, I hate to be promoting a site created by a Tarheel" Devil

Thanks Grey. We are just getting started.

If it makes you guys feel any better, our lead investor is a Duke alum and die-hard fan. We have some fun back and forth during board meetings ;-)

Lord Ash
11-28-2010, 08:28 PM
Thanks Grey. We are just getting started.

If it makes you guys feel any better, our lead investor is a Duke alum and die-hard fan. We have some fun back and forth during board meetings ;-)

Ah, very cool to have the actual creator here! Personally I think this sounds like a FANTASTIC project, and could absolutely have some value. Would love to know more about the actual bones of how it works!

Jim3k
11-28-2010, 08:34 PM
Thanks Grey. We are just getting started.

If it makes you guys feel any better, our lead investor is a Duke alum and die-hard fan. We have some fun back and forth during board meetings ;-)

Thanks for the feedback, Robbie. Although we always enjoy tweaking a Heel, we respect what you are trying to do. A fan is a fan is a fan. But those who take it to the next step are statisticians. And we have a number of them here. Good luck with this -- and I'm glad you could find the past tense algorithm. ;)

Plus, it's nice to know that Devils and Heels are working together and playing nice for this project.

killerleft
11-28-2010, 09:41 PM
Thanks Grey. We are just getting started.

If it makes you guys feel any better, our lead investor is a Duke alum and die-hard fan. We have some fun back and forth during board meetings ;-)

I can think of many things I'd rather invest in than lead.:o

Greg_Newton
11-28-2010, 10:05 PM
People like different things. About half the people I talk to say they look forward to getting just the facts without an "expert" injecting their opinion. The other half say they want the opinion.

The other point I'll make is that we are in the infant stages of this technology. It's going to improve immensely (even over this season) much less a couple of seasons from now. I look forward to hearing everyone's feedback as we progress. If I can please this crowd I can please anyone :-)

Interesting - I would guess it's going to take a ton of tweaks, but I would imagine it could be serviceable in a couple of years.

As the article says, I see this being useful more for low-profile match-ups that would not get a write-up at all if not for the software. Obviously, it's never going to replace the front page articles by the DBR folks and Sumner, which I rely on for an experienced perspective on how the team played in a grander context in addition to micro insights on specific players and plays that could only come from a human who knows basketball very well. However, if fine-tuned, I think it would be fine for the majority of the ESPN blurbs that pop up when you click on a result on a team's schedule.

For example, I was trying to get a good idea of how good Oregon was earlier, so I was looking at their previous results. They had several very close games with mid-major types for which summaries weren't available. I can see ESPN contracting out your software for games like that - it would be nice to have a summary of what happened and a little context for the numbers.

Good luck!