Jump to content
  • Join us — it's free!

    We are the premiere internet community for New York Rangers news and fan discussion. Don't wait — join the forum today!

IGNORED

Probability, Statistics for Sports


fletch

Recommended Posts

6 hours ago, Keirik said:

I have a masters degree in biochemistry. I’m not bad at numbers, but thank you. I will say, some of us did refer to some of the advanced classes as cal-clueless lol

 

Yes, a Beautiful Mind was good albeit a bit boring.  🙂

 

 

I think people are just joking with you my man. 

 

2 hours ago, Ozzy said:

 

Yeah, that!  😉

Figured that it was just joking, but didn't want to assume.  This whole thread is a wacky tangent, and I don't mind the ribbing.  I do know that there are some folks on BSBH that know a helluva lot more about advanced hockey stats than I do.  Hopefully they will chime in.  Otherwise it's a pretty weird thread.

  • Like 1
  • Cheers 1
Link to comment
Share on other sites

10 hours ago, fletch said:

 

Figured that it was just joking, but didn't want to assume.  This whole thread is a wacky tangent, and I don't mind the ribbing.  I do know that there are some folks on BSBH that know a helluva lot more about advanced hockey stats than I do.  Hopefully they will chime in.  Otherwise it's a pretty weird thread.

I don't understand your OP but I've seen enough to know you're wrong. 

 

( ❤️ you)

Link to comment
Share on other sites

2 hours ago, siddious said:

I’m just joshing you @fletch

 

probably not even a bad idea for a thread just was a ton of info to look at 

Appreciate the feedback.  No one wants to come to BSBH to do 'work', right?  Debate, have a point of view, tease, blow off steam, whatever.  But certainly not work.  And there's nothing worse than a thread with only the original poster.  

 

Sometimes if people introduce a thread, they feel like they 'own' the thread and get to direct a conversation.  User A has controversial take.  User B,C, and D disagree.  User A defends the controversial take, with every more ridiculous rationale.  User E pipes in that User A is being ridiculous, but one aspect of the point of view has merit.  Everyone loses interest and moves on.

 

I'm interested in a dialog, or whatever a conversation is among more than 2 people (a melee?).  It makes sense to start with hockey.   But if someone wants to talk about cricket stats, I'd be happy to learn.  Cricket has a passionate cult following.  Best teams seem to be England and places that were once part of the British empire.  

https://www.icc-cricket.com/rankings/mens/team-rankings/test

 

So this a thread for people to talk about statistics and probability in sports, particularly with respect to hockey.  I certainly don't 'own' the thread, or get to direct the conversation.

 

45 minutes ago, Pete said:

I don't understand your OP but I've seen enough to know you're wrong. 

 

( ❤️ you)

This is always a safe assumption.

  • LMFAO 1
Link to comment
Share on other sites

Expected goals

https://www.nhl.com/kraken/news/analytics-with-alison-expected-goals/c-327728890

 

'WHAT is Expected Goals?

In the broadest sense, expected goals (xG) is a measure that seeks to address the concern that not all shots are created equal. xG considers a variety of factors and then mathematically assigns a value to each shot attempt that represents the probability of that shot becoming a goal. That value can come in one of two forms: it can be a percentage - which directly represents how likely a goal was to follow; or it can be a straight value which factors in probability. Terms like "expected goals" and xG can feel clunky, so I like to call this measure simply "shot quality."'

 

'In addition to Evolving-Hockey.com, other sites that track xG include NaturalStatTrick.com, MoneyPuck.com, and HockeyViz.com. These sites will calculate how much shot quality does a player's team generate when that specific player is on the ice (Expected Goals For, xGF, higher is better) as well as how much shot quality does an opponent generate when said player is on the ice (Expected Goals Against, xGA, lower is better). So now we can see if a player is helping drive play for his team and/or if a player is limiting quality chances against.'

 

I could not locate any information on how expected goals are calculated by any site.

 

Also from https://www.nhl.com/kraken/news/analytics-with-alison-expected-goals/c-327728890

'There are many public models and each is mathematically unique so understanding what's included in each is important.'

 

If you've ever played yahtzee, you have a working knowledge of probability. You are trying to maximize your score to beat your opponents.

https://toyzschool.com/yahtzee-strategy-tips-and-tricks-ultimate-guide/

https://www.ultraboardgames.com/yahtzee/strategy.php

https://www.researchgate.net/publication/228756223_Optimal_solitaire_Yahtzee_strategies

 

There are 5 six-sided dice.  You have up to 3 turns before filling in a category on a score sheet.  You can reroll 1-5 dice to try and improve your score, setting dice aside that you want to keep.

 

If you have a small straight (1,2,3,4) or (3,4,5,6) and decide to roll 1 die, you have a 1/6 chance of getting the number you need for a large straight (a 5 if you have 1-4, a 2 if you have 3-6).

If you have a small straight (2,3,4,5) and decide to roll 1 die, you have a 2/6 chance of getting a large straight (1 or 6), so you've doubled the odds.

 

What the hell does yahtzee have to do with hockey?  I don't see Panarin carrying dice.  Are we going to start playing Dungeons and Dragons and having Trouba slay orcs with his staff of fury?


Hockey players are trying to maximize goals for and minimize goals against.  Good hockey plays versus bad plays.  In the defensive zone, don't try that cross-ice pass if there is a risk it will be intercepted in the middle of the ice and lead to a high danger chance.  Keep it around the boards, cycle back around the net, whatever.   On the penalty kill, if you can intercept the D to D pass, you can be off on a short-handed breakaway.   But if the defensemen doesn't make that pass, you are probably out of position on the kill, and more likely to give up the goal.  

 

The game is moving too fast to actively think, right.  Muscle memory, training, insight, trusting what you are seeing.

 

Expected goals is an interesting way of modeling team performance, goals for, and goals against.  It uses assumptions based on historical data on the likelihood that a shot will score, summed over the course of a game.  And it's not very satisfying when the Rangers have an expected goals of 4.13, but an actual result of shutout by some hotshot from Moose Jaw playing in his third NHL game.

Link to comment
Share on other sites

44 minutes ago, fletch said:

Expected goals

https://www.nhl.com/kraken/news/analytics-with-alison-expected-goals/c-327728890

 

'WHAT is Expected Goals?

In the broadest sense, expected goals (xG) is a measure that seeks to address the concern that not all shots are created equal. xG considers a variety of factors and then mathematically assigns a value to each shot attempt that represents the probability of that shot becoming a goal. That value can come in one of two forms: it can be a percentage - which directly represents how likely a goal was to follow; or it can be a straight value which factors in probability. Terms like "expected goals" and xG can feel clunky, so I like to call this measure simply "shot quality."'

 

'In addition to Evolving-Hockey.com, other sites that track xG include NaturalStatTrick.com, MoneyPuck.com, and HockeyViz.com. These sites will calculate how much shot quality does a player's team generate when that specific player is on the ice (Expected Goals For, xGF, higher is better) as well as how much shot quality does an opponent generate when said player is on the ice (Expected Goals Against, xGA, lower is better). So now we can see if a player is helping drive play for his team and/or if a player is limiting quality chances against.'

 

I could not locate any information on how expected goals are calculated by any site.

 

Also from https://www.nhl.com/kraken/news/analytics-with-alison-expected-goals/c-327728890

'There are many public models and each is mathematically unique so understanding what's included in each is important.'

 

If you've ever played yahtzee, you have a working knowledge of probability. You are trying to maximize your score to beat your opponents.

https://toyzschool.com/yahtzee-strategy-tips-and-tricks-ultimate-guide/

https://www.ultraboardgames.com/yahtzee/strategy.php

https://www.researchgate.net/publication/228756223_Optimal_solitaire_Yahtzee_strategies

 

There are 5 six-sided dice.  You have up to 3 turns before filling in a category on a score sheet.  You can reroll 1-5 dice to try and improve your score, setting dice aside that you want to keep.

 

If you have a small straight (1,2,3,4) or (3,4,5,6) and decide to roll 1 die, you have a 1/6 chance of getting the number you need for a large straight (a 5 if you have 1-4, a 2 if you have 3-6).

If you have a small straight (2,3,4,5) and decide to roll 1 die, you have a 2/6 chance of getting a large straight (1 or 6), so you've doubled the odds.

 

What the hell does yahtzee have to do with hockey?  I don't see Panarin carrying dice.  Are we going to start playing Dungeons and Dragons and having Trouba slay orcs with his staff of fury?


Hockey players are trying to maximize goals for and minimize goals against.  Good hockey plays versus bad plays.  In the defensive zone, don't try that cross-ice pass if there is a risk it will be intercepted in the middle of the ice and lead to a high danger chance.  Keep it around the boards, cycle back around the net, whatever.   On the penalty kill, if you can intercept the D to D pass, you can be off on a short-handed breakaway.   But if the defensemen doesn't make that pass, you are probably out of position on the kill, and more likely to give up the goal.  

 

The game is moving too fast to actively think, right.  Muscle memory, training, insight, trusting what you are seeing.

 

Expected goals is an interesting way of modeling team performance, goals for, and goals against.  It uses assumptions based on historical data on the likelihood that a shot will score, summed over the course of a game.  And it's not very satisfying when the Rangers have an expected goals of 4.13, but an actual result of shutout by some hotshot from Moose Jaw playing in his third NHL game.

Very much like anything/60, I highly subjective stat that doesn't take into account the individual player's skill set. If you have Barclay Goodrow and Artemi Panarin taking the same shot from the same location, there are some areas of the ice where you would expect a goal from one and not the other.

 

It's great for making large sweeping comparisons of players of the same ilk or teams of the same ilk, can't be taken seriously on its own. 

  • TroCheckmark 1
Link to comment
Share on other sites

8 hours ago, Pete said:

Very much like anything/60, I highly subjective stat that doesn't take into account the individual player's skill set. If you have Barclay Goodrow and Artemi Panarin taking the same shot from the same location, there are some areas of the ice where you would expect a goal from one and not the other.

 

It's great for making large sweeping comparisons of players of the same ilk or teams of the same ilk, can't be taken seriously on its own. 

Agreed that all models are flawed, imperfect representations of reality.  Without seeing the code, I'm waving my hands in the air.  Likely they figure the errors will average out.  Given the amount of historic data (10000 data points or whatever) there is a 80% chance of scoring from Zone A, 60% chance of scoring from Zone B, etc.  So if you have a normal distribution of probability of scoring from Zone A for all players, you've got data with error bars, which you can feel better about for team data. 

 

For individual player data, as you state, it is more problematic.   I don't know how you could correct for player skill set.  You could give an artificial ranking to each player in each zone of the ice.  Kreider a 85 from within 5 feet of the net, Panarin a 60 from the slot, etc.  Brings subjectivity to what is supposed to be objective based on a protocol, admittedly flawed.

Link to comment
Share on other sites

On 12/12/2022 at 7:24 AM, fletch said:

Expected goals

https://www.nhl.com/kraken/news/analytics-with-alison-expected-goals/c-327728890

 

'WHAT is Expected Goals?

In the broadest sense, expected goals (xG) is a measure that seeks to address the concern that not all shots are created equal. xG considers a variety of factors and then mathematically assigns a value to each shot attempt that represents the probability of that shot becoming a goal. That value can come in one of two forms: it can be a percentage - which directly represents how likely a goal was to follow; or it can be a straight value which factors in probability. Terms like "expected goals" and xG can feel clunky, so I like to call this measure simply "shot quality."'

 

'In addition to Evolving-Hockey.com, other sites that track xG include NaturalStatTrick.com, MoneyPuck.com, and HockeyViz.com. These sites will calculate how much shot quality does a player's team generate when that specific player is on the ice (Expected Goals For, xGF, higher is better) as well as how much shot quality does an opponent generate when said player is on the ice (Expected Goals Against, xGA, lower is better). So now we can see if a player is helping drive play for his team and/or if a player is limiting quality chances against.'

 

I could not locate any information on how expected goals are calculated by any site.

 

Also from https://www.nhl.com/kraken/news/analytics-with-alison-expected-goals/c-327728890

'There are many public models and each is mathematically unique so understanding what's included in each is important.'

 

If you've ever played yahtzee, you have a working knowledge of probability. You are trying to maximize your score to beat your opponents.

https://toyzschool.com/yahtzee-strategy-tips-and-tricks-ultimate-guide/

https://www.ultraboardgames.com/yahtzee/strategy.php

https://www.researchgate.net/publication/228756223_Optimal_solitaire_Yahtzee_strategies

 

There are 5 six-sided dice.  You have up to 3 turns before filling in a category on a score sheet.  You can reroll 1-5 dice to try and improve your score, setting dice aside that you want to keep.

 

If you have a small straight (1,2,3,4) or (3,4,5,6) and decide to roll 1 die, you have a 1/6 chance of getting the number you need for a large straight (a 5 if you have 1-4, a 2 if you have 3-6).

If you have a small straight (2,3,4,5) and decide to roll 1 die, you have a 2/6 chance of getting a large straight (1 or 6), so you've doubled the odds.

 

What the hell does yahtzee have to do with hockey?  I don't see Panarin carrying dice.  Are we going to start playing Dungeons and Dragons and having Trouba slay orcs with his staff of fury?


Hockey players are trying to maximize goals for and minimize goals against.  Good hockey plays versus bad plays.  In the defensive zone, don't try that cross-ice pass if there is a risk it will be intercepted in the middle of the ice and lead to a high danger chance.  Keep it around the boards, cycle back around the net, whatever.   On the penalty kill, if you can intercept the D to D pass, you can be off on a short-handed breakaway.   But if the defensemen doesn't make that pass, you are probably out of position on the kill, and more likely to give up the goal.  

 

The game is moving too fast to actively think, right.  Muscle memory, training, insight, trusting what you are seeing.

 

Expected goals is an interesting way of modeling team performance, goals for, and goals against.  It uses assumptions based on historical data on the likelihood that a shot will score, summed over the course of a game.  And it's not very satisfying when the Rangers have an expected goals of 4.13, but an actual result of shutout by some hotshot from Moose Jaw playing in his third NHL game.

This is a good thread Fletch. It's good information to have for pretty much all sports nowadays. I used to be much more invested in hockey stats, but I kind of lost interest and the stats "community" really rubbed me the wrong way. Plus I stopped having time to research stats shit during all day lol

 

Expected goals answers the main question we always had when Corsi and Fenwick were popular in the early 2010s - "How do you account for shot quality?" Measuring a team's quality of play by raw shot attempts, as Corsi and Fenwick attemps to do, is hardly a good method when not all shot attempts are created equal. Expected goals models try to solve for the quality question by introducing a variety of factors. It's really model dependent as to how things are weighted but you typically see shot distance as the main metric for determining shot quality or whether or not the shot was a rebound.

Most, if not all, of the public models use public NHL data from the game reports feed on the NHL's website. The NHL is notorious for being absolutely horrendous for tracking stats correctly. You'll see shots being awarded for the wrong place on the ice, the wrong side of the ice, the wrong player, etc.. This makes it very difficult to point to a public expected goals model as the gold standard. The NHL needs to do much better when it comes to real-time stat tracking.

 

You might see a lot of talk about how teams or private companies (like Valiquette's Clear Sight Analytics) have their own internal stats and that's because they put a lot more manpower into actually tracking events in each game - things like activity before a shot, odd-man rushes, screens, etc.. Typically it's manual tracking, but some teams are known to use player tracking technology to get some data they put into their internal models. 

Public models need to improve in a few ways. To me, the most important way is taking actual shooter talent into account. An Auston Matthews shot taken from anywhere on the ice is more "valuable" than a shot taken by pretty much anyone else in the league. To my knowledge, there are not many models who take that into account - so that seems to be the next logical step for public models. Beyond that, the NHL improving their tracking would go a long way. 

Link to comment
Share on other sites

16 hours ago, fletch said:

Agreed that all models are flawed, imperfect representations of reality.  Without seeing the code, I'm waving my hands in the air.  Likely they figure the errors will average out.  Given the amount of historic data (10000 data points or whatever) there is a 80% chance of scoring from Zone A, 60% chance of scoring from Zone B, etc.  So if you have a normal distribution of probability of scoring from Zone A for all players, you've got data with error bars, which you can feel better about for team data. 

 

For individual player data, as you state, it is more problematic.   I don't know how you could correct for player skill set.  You could give an artificial ranking to each player in each zone of the ice.  Kreider a 85 from within 5 feet of the net, Panarin a 60 from the slot, etc.  Brings subjectivity to what is supposed to be objective based on a protocol, admittedly flawed.

Evolving Hockey posted some of their nominal code here:

https://evolving-hockey.com/blog/a-new-expected-goals-model-for-predicting-goals-in-the-nhl/

 

It's an interesting read.

  • Applause 1
Link to comment
Share on other sites

7 hours ago, Morphinity 2.0 said:

Evolving Hockey posted some of their nominal code here:

https://evolving-hockey.com/blog/a-new-expected-goals-model-for-predicting-goals-in-the-nhl/

 

It's an interesting read.

@Morphinity 2.0 @PeteThanks, it is a really interesting read, going to take me some time to digest.   Regarding Pete's post, it appears some models do correct for shooters.  The [here] is a link

 

'Dawson Sprigings and Asmae Toumi developed an xG model in 2015 [here] that became very popular in the hockey analytics community. Sprigings and Toumi proposed a “shot-multiplier” variable that incorporated “shooting talent” in the model and showed that it was “a better predictor of future scoring than Corsi, Goals”. '

 

The Evolving hockey found other factors more important, for their eventual model development for Expected Goals (xG).  https://evolving-hockey.com/blog/a-new-expected-goals-model-for-predicting-goals-in-the-nhl/

 

Even-Strength

image-1-2-750x536.png

Link to comment
Share on other sites

16 hours ago, fletch said:

@Morphinity 2.0 @PeteThanks, it is a really interesting read, going to take me some time to digest.   Regarding Pete's post, it appears some models do correct for shooters.  The [here] is a link

 

'Dawson Sprigings and Asmae Toumi developed an xG model in 2015 [here] that became very popular in the hockey analytics community. Sprigings and Toumi proposed a “shot-multiplier” variable that incorporated “shooting talent” in the model and showed that it was “a better predictor of future scoring than Corsi, Goals”. '

 

The Evolving hockey found other factors more important, for their eventual model development for Expected Goals (xG).  https://evolving-hockey.com/blog/a-new-expected-goals-model-for-predicting-goals-in-the-nhl/

 

Even-Strength

image-1-2-750x536.png

Yeah I read that part and they seem to agree that shooting talent should be at least considered in an xG model. But they (Evolving Hockey, at least) just haven't figured out a way to incorporate it yet into their models without it breaking the whole thing. It's obviously not as simple as saying "Matthews is a career 15% shooter" and tying a weight to it. But it makes perfect logical sense that I'd rather have Matthews shooting the puck than Trouba shooting the puck from literally anywhere on the ice - and there's a greater chance of Matthews's shot going in than Trouba's.

  • Like 1
Link to comment
Share on other sites

6 minutes ago, Morphinity 2.0 said:

Yeah I read that part and they seem to agree that shooting talent should be at least considered in an xG model. But they (Evolving Hockey, at least) just haven't figured out a way to incorporate it yet into their models without it breaking the whole thing. It's obviously not as simple as saying "Matthews is a career 15% shooter" and tying a weight to it. But it makes perfect logical sense that I'd rather have Matthews shooting the puck than Trouba shooting the puck from literally anywhere on the ice - and there's a greater chance of Matthews's shot going in than Trouba's.

Bingo. This is the reason why any argument based wholly on these stats can't be taken seriously. Analytics aren't the turkey, they're the yams, cranberry, stuffing, and pumpkin pie. 

Link to comment
Share on other sites

1 hour ago, Pete said:

Bingo. This is the reason why any argument based wholly on these stats can't be taken seriously. Analytics aren't the turkey, they're the yams, cranberry, stuffing, and pumpkin pie. 

Yeah. I give more weight to things like CSA's stats or whatever internal team stats are out there than to whatever player cards, report cards, and other stuff based on public models. The public stuff is decent conversation fodder, but people post player cards/stats as if they're the quintessential source for player evaluation. 1) Trying to boil a player down to percentiles is really reductive. 2) These public models are playing with half a deck of cards missing. 

Link to comment
Share on other sites

1 hour ago, Morphinity 2.0 said:

Yeah. I give more weight to things like CSA's stats or whatever internal team stats are out there than to whatever player cards, report cards, and other stuff based on public models. The public stuff is decent conversation fodder, but people post player cards/stats as if they're the quintessential source for player evaluation. 1) Trying to boil a player down to percentiles is really reductive. 2) These public models are playing with half a deck of cards missing. 

Totally agree.

 

These models are weighted towards what the model creator thinks is valuable, look at Corsi. Dude just decided that shot attempts were the most valuable stat and built an entire ecosystem off of it.

Link to comment
Share on other sites

4 minutes ago, Pete said:

Totally agree.

 

These models are weighted towards what the model creator thinks is valuable, look at Corsi. Dude just decided that shot attempts were the most valuable stat and built an entire ecosystem off of it.

I think the general idea around Corsi does make some sense (that if you have the puck, you tend to shoot it more, so shot attempts roughly equal some analogue for possession), but people took it too far. 

  • Keeps it 100 1
Link to comment
Share on other sites

Expected goals appears to be model specific, with each modeler using a different model.

 

This piece is co-authored between DTMAboutHeart and asmean.

https://hockey-graphs.com/2015/10/01/expected-goals-are-a-better-predictor-of-future-scoring-than-corsi-goals/

 

'The model also takes into consideration shooter talent, which we know varies significantly from player to player. Accounting for shooting talent makes intuitive sense, as we expect that shots attempted by Brad Marchand on average have a higher likelihood of resulting in goals than shots taken by, say, Tanner Glass. To this end, a “Shot Multiplier”*** was developed to approximate a player’s effect on each shot’s probability of resulting in a goal. The Shot Multiplier was determined by following these steps:...'

 

'Regressed Shots...Regressed Goals...Regressed Sh%...Shot Multiplier: was computed by dividing a player’s regressed Sh% (rSh%) by the league average Sh%.'

 

'Finally, each player’s shot was multiplied by their Shot Multiplier. Steps 1) to 4) can be followed along the table below, which uses Steven Stamkos’ 2011-2012 season as an example:
Screen Shot 2015-09-30 at 8.38.38 PM

 

This model also used shot distance, shot angle, shot type, rush shot, rebounds, on/off wing.

 

'Conclusion and Future Directions

Expected Goals (xG) significantly outperforms score-adjusted Corsi (CF%) and Goals For (GF%) in predicting future goals at the team and player levels. xG is also descriptive, which makes it a superior tool in evaluating a team and player’s past and current offensive performance.'

Link to comment
Share on other sites

On 12/13/2022 at 4:11 PM, fletch said:

@Morphinity 2.0 @PeteThanks, it is a really interesting read, going to take me some time to digest.   Regarding Pete's post, it appears some models do correct for shooters.  The [here] is a link

 

'Dawson Sprigings and Asmae Toumi developed an xG model in 2015 [here] that became very popular in the hockey analytics community. Sprigings and Toumi proposed a “shot-multiplier” variable that incorporated “shooting talent” in the model and showed that it was “a better predictor of future scoring than Corsi, Goals”. '

 

The Evolving hockey found other factors more important, for their eventual model development for Expected Goals (xG).  https://evolving-hockey.com/blog/a-new-expected-goals-model-for-predicting-goals-in-the-nhl/

 

Even-Strength

image-1-2-750x536.png

See above model, different factors included in model.    https://evolving-hockey.com/blog/a-new-expected-goals-model-for-predicting-goals-in-the-nhl/

 

'As is customary, let’s run through some validation numbers and discuss model performance. This is a bit of tricky topic, in our opinion, as the dataset is so large and the way this has been discussed in the past has varied quite a bit. Recently, (and somewhat initially), the ROC Curve and AUC or Area Under Curve metric have been used to evaluate any given xG model’s performance.'

 

'Both of us wanted to include a shooting talent variable; however, the algorithm did not feel the same. This variable (in each model) was never used in any decision tree that was generated – essentially, it determined that this variable added no value for predicting goals.' (my translation - shooting talent was considered, but shot distance, seconds since last shot, shot angle, etc.  always were better predictors of expected goals.  Therefore, including shooting talent increased the time it took to do a model run without improving model performance.)

 

'Now, to demonstrate a few of the model’s potential weak points, let’s look at some goals that have attributes we don’t have access to (information that is not recorded by the NHL/not included in the RTSS data).... While we’d really like complete passing data, the above statements are mostly conjecture at this point. Both of us are fairly confident that any xG model would benefit somewhat significantly from knowing where a pass came from and when it occurred… and we haven’t even touched on zone entries/exits. A lot of this data is currently being tracked publicly, and this could be incorporated into future versions of xG models. Regardless, we feel the model still performs well given the data available.'

 

From what I've seen, modelers are attempting to optimize performance of their model.  Given limits on computer power (12 hour model runs!) it becomes logistically impossible to include every factor in a model, due to computation time.... often a 'training data set' is used to investigate model performance with different parameters included.  Then the optimized model is used on another dataset (production) with eventual goal of making available the final results.

 

Models are imperfect tools with flaws, but useful tools to investigate the relative importance of parameters (factors) that contribute to observed data.  It took some hockey knowledge to identify a flaw in the model, that the contribution of passing was not included in the model, and a short-coming.

 

So there is no substitute for knowing the game (the eye test), but using data can support (or disagree) with your observations.

Edited by fletch
Link to comment
Share on other sites

I thought it would be interesting to look at another sport's statistics, to contrast with hockey.  Baseball was an obvious choice for me.  Baseball has discrete plays - the ball starts in the pitcher's glove.  This is very different game structure than hockey, which has continuous action (except for stoppages).  Baseball is a statistician's dream, and there is a wealth of statistics available.  I'll provide glossaries and sources for additional information.  I'll provide some basic statistics.  And I'll provide what some baseball statisticians think are the most important descriptors of player performance.  Statistics are generally broken down as batting, pitching, or fielding - similar to skater or goalie.

 

General glossary for abbreviations:

https://www.mlb.com/glossary/standard-stats

 

If you are in a rec league, and want to calculate stats for a player on your team:

https://baseballtips.com/baseball-calculators-tools/

 

Comprehensive list of statistics:

https://baseballtrainingworld.com/baseball-stats-101-a-complete-glossary-of-baseball-statistics/

 

Player focused statistics:

https://www.baseball-reference.com/players/n/nimmobr01.shtml

Brandon Nimmo, New York Mets

WAR, AB, H, HR, BA, R, RBI, SB, OBP, SLG, OPS, OPS+
Edited by fletch
Link to comment
Share on other sites

So let's focus on player focused statistics, batting:

https://www.baseball-reference.com/players/n/nimmobr01.shtml

Brandon Nimmo, New York Mets

WAR, AB, H, HR, BA, R, RBI, SB, OBP, SLG, OPS, OPS+

 

https://baseballtrainingworld.com/baseball-stats-101-a-complete-glossary-of-baseball-statistics/

PA = Plate appearance = total number of times a player completes a batting turn

AB = At-bats = Plate appearance - (sacrifice + walk + hit by pitch + catcher interference)

H = Hits = Ball in play, batter gets to at least first base without an error or fielder's choice

HR = Home Runs = Ball in play, batter makes it safely around all four bases without an error or fielder's choice

BA = Batting average = Hits divided by at bats

R= Runs = Base runner touches home base

RBI= Runs batted in = Batter puts ball in play, run(s) score; when double play or error allows run to score, no RBI

SB = Stolen bases = Base runner reaches next base without batter putting ball in play, an error, or defensive indifference

OBP = On base percentage = (Hits + walks + Hit by pitches) divided by plate appearances

SLG = Slugging percentage = (singles + 2 *(doubles) + 3 *(triples) + 4 *(home runs)) divided by At bats

OPS = On base percentage + slugging percentage = On base percentage + slugging percentage

OPS+ = On base percentage + slugging percentage plus = normalization of OPS across all batters, while correcting for external variables (like ball parks) = (On base percentage) divided by League on base percentage) * 100

 

https://www.mlb.com/glossary/advanced-stats/wins-above-replacement

WAR=Wins above replacement =  (The number of runs above average a player is worth in his batting, baserunning and fielding + adjustment for position + adjustment for league + the number of runs provided by a replacement-level player) / runs per win

 

Note again that baseball has discrete plays, making it easier to calculate statistics. 

I'll follow with pitching, fielding stats

Link to comment
Share on other sites

Pitcher statistics

Justin Verlander, New York Mets

https://www.baseball-reference.com/players/v/verlaju01.shtml

WAR, W, L, ERA, G, GS, SV, IP, SO, WHIP

 

https://baseballtrainingworld.com/baseball-stats-101-a-complete-glossary-of-baseball-statistics/

W = Win = Pitcher in game when team takes lead for good, starting pitchers must pitch a minimum of 5 innnings

L = Loss = Pitcher in game when team loses lead for good

ERA = Earned Run Average = average number of earned runs per nine innings = 9 * (Earned Runs / Innings Pitched)

G = Games Played = appearance in game

GS = Games Started = appearance in game as starting pitcher

SV = Saves = Final relief pitcher for winning team who entered during a save opportunity (team winning by no more than 3 runs OR tying run on base, up to bat, or on deck OR pitches at least 3 innings)

IP = Innings pitched, measured in thirds (an out = 1/3)

SO = Shutout = pitcher throws an entire game without allowing a run

WHIP = Walks and hits per inning pitched = (Walks + hits) divided by innings pitched

 

https://www.mlb.com/glossary/advanced-stats/wins-above-replacement

WAR = Wins above replacement

Different WAR computations use either RA9 or FIP. Those numbers are adjusted for league and ballpark. Then, using league averages, it is determined how many wins a pitcher was worth based on those numbers and his innings pitched total.

 

RA9 = Runs allowed per nine innings

FIP = Fielding independent pitching

 

FIP

https://www.omnicalculator.com/sports/fip

'What is FIP - Fielding Independent Pitching?

The FIP baseball statistic tells you what the earned run average of a player would look like over some time, were that pitcher to experience league average results in balls in play and league average timings. And in simpler terms - it measures the effectiveness of a pitcher based solely on events that the pitcher can control: home runs (HRs), walks (BBs), hits by pitch (HBPs) and strikeouts (Ks). FIP allows for a solid indication of how well a pitcher performs as it takes out additional variables, such as the role of the opposing team's defense, or sheer luck.'

 

Similar to expected goals, fielding independent pitching involves modeling, and is proprietary - each modeler will come up with a slightly different value.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...