Jump to content
  • Join us — it's free!

    We are the premiere internet community for New York Rangers news and fan discussion. Don't wait — join the forum today!

IGNORED

Probability, Statistics for Sports


fletch

Recommended Posts

1 minute ago, Cash or Czech said:

Fletch, when're you gonna convert this into the best odds of the night??

 

There we go!!!  I can use a little help!!  I stayed away from Seattle last night, thank goodness.  That 3 1/2 was staring me in the face, saying< "GO AHEAD!!  MAKE MY DAY!!"

 

🙂

Link to comment
Share on other sites

28 minutes ago, Morphinity 2.0 said:

Baseball is a treasure trove if you're a stats nerd. It works better there than for any other sport.

I'm not sure I agree. It's actually made the game unwatchable. 

 

Good read here.

https://www.theatlantic.com/newsletters/archive/2022/10/sabermetrics-analytics-ruined-baseball-sports-music-film/671924/

 

 

Edited by Pete
Link to comment
Share on other sites

2 minutes ago, Pete said:

I'm not sure I agree. It's actually made the game unwatchable. 

 

Good read here.

https://www.theatlantic.com/newsletters/archive/2022/10/sabermetrics-analytics-ruined-baseball-sports-music-film/671924/

 

 

 

I mean that's a separate conversation - how the stats/analysis are applied - even though I tend to agree the stats have kind of taken some of the magic out of the game. 

  • Like 1
Link to comment
Share on other sites

3 hours ago, Cash or Czech said:

Fletch, when're you gonna convert this into the best odds of the night??

 

3 hours ago, Ozzy said:

 

There we go!!!  I can use a little help!!  I stayed away from Seattle last night, thank goodness.  That 3 1/2 was staring me in the face, saying< "GO AHEAD!!  MAKE MY DAY!!"

 

🙂

 

I've stayed away from betting on game results, intentionally.  I get worked up enough as it is.  In NFL with last play of game laterals, not uncommon to have a defensive score as time runs out, which sometimes changes a betting win to a loss, or a betting loss to a win.  In reality it just means that one team wins by an additional score, but for the bettor 'bad beats' (SVP segment on ESPN SportsCenter) are particularly frustrating.  I don't want to have to root for hockey OT just so I can hit the Over.

 

I like fantasy hockey and football.  I've stayed away from fantasy baseball just because, as seen by the stat discussion, there is potentially a very real advantage to immersing yourself that deeply into stats, and I don't want to be that obsessed.

 

I also like office pools picking NFL games against the spread, picking the NCAA tournament, picking NCAA football bowls, other fun stuff.  Various websites run fun contests for the Premier League, NCAA tournament, etc.  Unfortunately, a lot of the free content on websites is disappearing because of the rise of FanDuel, DraftKings, etc.  People would rather compete for real money than just recreationally make picks against their friends for bragging rights.

 

I've also run 'predict the future' kind of contests with friends for a wide variety of sports including NCAA football and the FA Cup, using spreadsheets or survey monkey.  I only did FA Cup once because of the number of rounds and work to do a survey monkey poll for each round of the FA Cup (1 point for first round, 2 points for second round, etc).

 

So with that caveat, from what I understand from the pro gamblers (if you're not just having fun betting a few bucks).

Have a notebook (or excel spreadsheet) where you track all your bets.  The pros are aiming for at least 55-60% of bet wins, because of VIG.  This enables you to look at your weekly or monthly bets and see how much you are up or down.

The pro gamblers know that they are going to have hot streaks and cold streaks.  By tracking their bets, they know if they are making or losing money long-term.  They may stop betting football to focus on baseball, based on the analysis of their betting results.

 

That's my public service announcement.  There's nothing wrong with wanting a little action on a game.  Just keep track of how much you are winning or losing, so your significant other or family doesn't feel like they need to stage an intervention.

  • Like 2
Link to comment
Share on other sites

  • 3 weeks later...

I had to take a break from this thread because of the holidays.  To continue the baseball discussion, I wanted to post some rough rules for what the statistics mean.  In hockey, we've got a general feel for what it means to be a 20 goal/year scorer vs. 30 goals/year vs. 40 goals/year.  Statistics are a tool to quantify performance.   So what does WAR mean?

 

https://thebaseballguide.com/war-in-baseball/

Explaining WAR Values (0 to 8+) for a season.

On generic criteria, 0 is taken as the league average to further compare the player. Here are a few conventions that can give an idea,

Less than 0 – A player that can be replaced

0 to 2 – The player can be referred to as the backup player

2.1 to 4.9 – The player that can start most of the games

5 to 7.9 – Overall the best candidate for an All-Star game

8 above – A specific player who can be considered irreplaceable or an MVP.

 

Here is a list of the top 20 baseball players with the highest WAR number for career.

Name (WAR Number)

Barry Bonds (162.8)

Babe Ruth (162.1)

Willie Mays (156.25)

Ty Cobby (151)

Henry Aaron (143)

Tris Speaker (134.3)

Honus Wagner (130.9)

Stan Musial (128.3)

Rogers Hornsby (127.1)

Eddie Collins (124.4)

Ted Williams (122.0)

Alex Rodriguez (117.6)

Lou Gehrig (113.6)

Rickey Henderson (111.1)

Mel Ott (110.8)

Mickey Mantle (110.2)

Frank Robinson (107.2)

Nap Lajoie (106.9)

Mike Schmidt (106.8)

Joe Morgan (100.4)

Link to comment
Share on other sites

For OPS:

https://towardsdatascience.com/stats-for-baseball-fans-the-single-metric-for-offense-is-ops-fc568af5e87b

 

My recommendation is to watch out for players who have extremely high OPS stats. They will be your most exciting hitters, usually. According to the data, an OPS of 0.8 is exceptional; and I use that as my cutoff to assess players today.

 

 

1*KPGGX7c4mZB8xEtXsVCb9w.png

Link to comment
Share on other sites

If you want to get deeper into the weeds for baseball stats, there are lots of sources:

This website provides some links/help for what The Athletics considers important.

https://theathletic.com/255898/2018/02/28/a-sabermetric-primer-understanding-advanced-baseball-metrics/

 

WAR: Wins Above Replacement (pitcher or batter)

<0 WAR: Below replacement level (Kendrys Morales)
0-1 WAR: Scrub (Tim Anderson)
1-2 WAR: Role player (Trevor Story)
2-3 WAR: Solid starter (Jackie Bradley Jr.)
3-4 WAR: Good player (Andrew McCutchen)
4-5 WAR: All-Star (George Springer)
5-6 WAR: Superstar (Nolan Arenado)
6+ WAR: MVP (Giancarlo Stanton)

 

wOBA: Weighted On-Base Average (batter)

 

wRC+: Weighted Runs Created Plus (batter)

180: 80 percent above average (Mike Trout)
160: 60 percent above average (Jose Altuve)
140: 40 percent above average (Paul Goldschmidt)
120: 20 percent above average (Carlos Santana)
100: League average (Ender Inciarte)
90: 10 percent below average (Jordy Mercer)
80: 20 percent below average (Freddy Galvis)

 

OPS+: On-Base Plus Slugging Plus (batter)

BB% and K% (pitcher)

BABIP: Batting Average on Balls In Play (pitcher or batter)

ISO: Isolated Power (batter)

FIP: Fielding Independent Pitching (pitcher)

Link to comment
Share on other sites

What is the probability that the New York Rangers would go from 1994 to 2021 without winning the Stanley Cup?

It seems unlikely that the Rangers would go 28 years without a title.  But can we make a mathematical estimate of how unlikely this event is?

First, some links, then some assumptions.

https://www.varsitytutors.com/hotmath/hotmath_help/topics/multiplication-rule-of-probability

https://en.wikipedia.org/wiki/List_of_New_York_Rangers_seasons

https://www.nhl.com/news/nhl-expansion-history/c-281005106

https://en.wikipedia.org/wiki/2021–22_Seattle_Kraken_season

 

For simplicity, I am going to assume that every NHL team has an equal chance of winning the Cup.  We know that this year, the Stars are more likely to win the Stanley Cup than the Blackhawks, so this is a potential improvement to the probability calculation.  Also, I will assume that each season is an independent event - that results from 2020 have no impact on 2021.   This is true, as wins/playoff performances in 2020 do not affect wins/playoff performances in 2021.  A good team in 2020 is more likely to be good in 2021, so that is another potential improvement to the probability calculation.

 

For the Rangers in 2022, the probability of them not winning the Cup is 31/32 = 0.96875.   There are 32 teams, and 31 teams will not win the Cup.  Similarly in 2021 there were 32 teams.  So the probability of the Rangers not winning the cup in 2021 and 2022, through the multiplication rule, is

(31/32) * (31/32) = 0.938477

 

With NHL expansion, the number of teams is not constant.  So for the period from 1994 to 2021, we have

(25/26) * (25/26) * (25/26) * (25/26) * (26/27) * (27/28) * (29/30) * (29/30) * (29/30) * (29/30) * (29/30) * (29/30) * (29/30) * (29/30) * (29/30) * (29/30) * (29/30) * (29/30) * (29/30) * (29/30) * (29/30) * (29/30) * (30/31) * (30/31) * (30/31) * (30/31) * (30/31) * (31/32) = 0.379419

 

The probability of the Rangers not winning the Cup from 1994 to 2021 is 37.9419%, given the assumptions listed above.

 

You can do a similar calculation for the Rangers not winning the Cup from 1940-1992.  The probability is much lower than the period from 1994-2021.  However, this probability is not driven as much by the longer period of time.  Until the NHL expanded to 12 teams in 1967, the period from 1940-1966 only had 6 or 7 teams.  Many years of (5/6) * (5/6), which drives the probability of the Rangers not winning the Cup much lower during that period. The years from 1967-1992 have a smaller impact on the overall low probability.

 

The probability of the Rangers not winning the Cup from 1940 to 1992 is 0.001649 or 0.16488%, given the assumptions listed above.

 

In conclusion, it was much more unlikely for the Rangers to not win a Cup from 1940-1993 than from 1994-2021.

Edited by fletch
Link to comment
Share on other sites

9 minutes ago, Pete said:

This thread is a great reminder that sport is athletes competing on the court/field/rink and we don't just look at spreadsheets to determine outcomes. 

True, regarding players AND coaches.  Full stop.

 

General managers do look at spreadsheets, to determine roster construction.  They do rely more on scouts, coaches, watching games/practices, and evaluations.  But with the salary cap and all the metrics available, it would be foolish for organizations to not use spreadsheets to look at various statistics, and look at the range of AAV values for market comparisons to players on their roster.  Particularly when evaluating contract situations and whether to tender an offer to a current player or a free agent.

https://www.capfriendly.com/teams/rangers

Link to comment
Share on other sites

17 minutes ago, fletch said:

True, regarding players AND coaches.  Full stop.

 

General managers do look at spreadsheets, to determine roster construction.  They do rely more on scouts, coaches, watching games/practices, and evaluations.  But with the salary cap and all the metrics available, it would be foolish for organizations to not use spreadsheets to look at various statistics, and look at the range of AAV values for market comparisons to players on their roster.  Particularly when evaluating contract situations and whether to tender an offer to a current player or a free agent.

https://www.capfriendly.com/teams/rangers

I'm simply talking about the probability of winning a championship in your last post.

 

Yes you would think it would be highly improbable for the Rangers to not win for however many years, but the fact is they had many deep playoff runs where a good bounce here or there or a post vs a goal gives you a different outcome.

 

One would look at those numbers and think the Rangers are a failure, however they are one of the more successful teams in recent history. 

 

When you look at the Avalanche you see a successful team, but 5 short years ago they were among the worst team in the league. 

 

Sometimes the numbers don't tell the real story. 

Link to comment
Share on other sites

3 hours ago, Pete said:

I'm simply talking about the probability of winning a championship in your last post.

 

Yes you would think it would be highly improbable for the Rangers to not win for however many years, but the fact is they had many deep playoff runs where a good bounce here or there or a post vs a goal gives you a different outcome.

 

One would look at those numbers and think the Rangers are a failure, however they are one of the more successful teams in recent history. 

 

When you look at the Avalanche you see a successful team, but 5 short years ago they were among the worst team in the league. 

 

Sometimes the numbers don't tell the real story. 

Right, so I think this feeds into the simplifying assumptions I made, and was up front about.  I did the simplest calculations based on an equal probability of every team in the NHL having an equal chance of winning the Stanley Cup every year, and this probability not changing over time (only affected by the number of teams in the league).  The probability of them not winning the Cup is 31/32 = 0.96875 when there are 32 teams.

 

You can argue that for consistent playoff teams, the probability of not winning the Cup is smaller, less that 0.96875, and this would provide a probability correction.  I haven't invested the time to see how that would affect the end probability results.

 

A good bound here and a post there giving a different outcome is quite true for playoff teams.  People do simulate entire seasons and playoffs by going game-by-game.  There can be in game corrections based on score, and random elements.  Model simulations are interesting, but still flawed - as output depends on the data used for the input.

 

All modelers and statisticians are upfront that their models and calculations are simplifications of reality and therefore flawed.  However, they can be used to tweak different parameters and see how the tweaks result in different outcomes.  You can't run reality more than once.  You can run a model or simulation 1000s of times.  By working through models and simulations, you can investigate what factors are most important to what is actually happening on ice.  Then, there is always another year of reality to compare your models/simulations. 

 

So with your conclusion 'Sometimes the numbers don't tell the real story' you are always right, because models/simulations never match reality.  But I disagree that you can't learn anything from statistics, probability, modeling, and simulations.  They are flawed, limited tools, and simplifications of reality.  But if you have a simplified model with only 3 factors that gives you a reasonable approximation of what is happening in the real world, that is interesting.  And one of the strengths of modeling is that you can vary 1,2,3...n factors and see how well each of those models compare to reality.  Which to many, is quite useful to be able to do.

Link to comment
Share on other sites

12 minutes ago, fletch said:

Right, so I think this feeds into the simplifying assumptions I made, and was up front about.  I did the simplest calculations based on an equal probability of every team in the NHL having an equal chance of winning the Stanley Cup every year, and this probability not changing over time (only affected by the number of teams in the league).  The probability of them not winning the Cup is 31/32 = 0.96875 when there are 32 teams.

 

You can argue that for consistent playoff teams, the probability of not winning the Cup is smaller, less that 0.96875, and this would provide a probability correction.  I haven't invested the time to see how that would affect the end probability results.

 

A good bound here and a post there giving a different outcome is quite true for playoff teams.  People do simulate entire seasons and playoffs by going game-by-game.  There can be in game corrections based on score, and random elements.  Model simulations are interesting, but still flawed - as output depends on the data used for the input.

 

All modelers and statisticians are upfront that their models and calculations are simplifications of reality and therefore flawed.  However, they can be used to tweak different parameters and see how the tweaks result in different outcomes.  You can't run reality more than once.  You can run a model or simulation 1000s of times.  By working through models and simulations, you can investigate what factors are most important to what is actually happening on ice.  Then, there is always another year of reality to compare your models/simulations. 

 

So with your conclusion 'Sometimes the numbers don't tell the real story' you are always right, because models/simulations never match reality.  But I disagree that you can't learn anything from statistics, probability, modeling, and simulations.  They are flawed, limited tools, and simplifications of reality.  But if you have a simplified model with only 3 factors that gives you a reasonable approximation of what is happening in the real world, that is interesting.  And one of the strengths of modeling is that you can vary 1,2,3...n factors and see how well each of those models compare to reality.  Which to many, is quite useful to be able to do.

Not saying you can't learn anything.

 

Just saying you have to play the game. Models would have predicted Russian beating the USA 99.999999% of the time.

 

Models probably have the Rangers losing to Carolina and Pittsburgh last playoffs.

 

I think even if you weighted probability models for the last decade more than half of NHL team's fanbases would be surprised if how they're considering what success vs failure looks like.

  • TroCheckmark 1
Link to comment
Share on other sites

Well, as Pete alluded to, no probability model is going to be that accurate without including many of the human variables in the first place that exist in hockey. You really shouldn’t  base a model with inaccuracies such as saying all teams have an equal chance of winning a cup. We all know that’s not accurate at all because it doesn’t include roster variations and performances of said roster. We also can’t be so sure to assume each season is independent of each other. Experience is a very difficult aspect to calculate whether it be a team learning how to win as a group or experience of younger guys coming into their own by being given playing time years before in a lost season or two. 
 

 Models like these work so much better in baseball where events are more isolated and variables limited. It’s a reason why moneyball worked so well.

 

I’m sure they can be accomplished in hockey as well, but there would need to be a ton of individual or other team statistics included such as seeing how probabilities change if a team scores X amount of goals, allows less than y amount, or has goalie stats greater or less than a certain number. Things like that.

 

  • TroCheckmark 1
Link to comment
Share on other sites

3 minutes ago, SaveByRichter35 said:

Man you guys are just shitting all over Fletch.  @fletch this is all way too much math for me, fuck that.  But I commend you for taking your time to run the numbers and present an interesting idea.  Also thank you for reminding us just how unlucky we all are to be Rangers fans.  At least we all have each other.  

Are we? I think it’s just all part of the conversation. I think it’s been a good conversation. 

Link to comment
Share on other sites

@SaveByRichter35 

Thank you for the kind words!   With the example, I wanted to show how you can use classic probability (i.e. multiplication rule) to approach a question with clearly stated assumptions.  It took me some time to think about the approach, but only 30-60 minutes to calculate each of my probability results using an excel spreadsheet.  As previously stated for the period from 1940-1966, an overall low probability of not winning the Cup can be driven by multiplying multiple years of (relatively) low probability.   Given a single parameter, it is not surprising that the value of the parameter drives the calculation, and suggests doing a sensitivity analysis – how much results are affected by varying the probability of not winning a Cup in a single year.  To simplify, the probability of not winning the Cup is kept constant over a 10-year period. 

 

For a 6 team league, this is (5/6) * (5/6) * (5/6) * (5/6) * (5/6) * (5/6) * (5/6) *(5/6) * (5/6) * (5/6), or (5/6)^10 in a classic approach.  You can also say that in a larger league, your analysis has limited the Cup winner to 6 possible teams, and eliminated other teams as possibilities to win.  This may be a useful investigation, but you are introducing some observer expertise (or bias, depending on your perspective) on which teams are eliminated from consideration.

For a 32 team league for a 10 year period, (31/32) ^10.

 

(5/6)^10  =   0.161505583

(11/12)^10=0.418903888

(17/18)^10=0.564630277

(23/24)^10=0.653380160

(29/30)^10=0.712471394

(31/32)^10=0.727976157

(33/34)^10=0.741908298

(39/40)^10=0.776329621

(45/46)^10=0.802688093

(51/52)^10=0.823508952

(55/56)^10=0.835115655

(59/60)^10=0.845293662

 

Which was used to produce a graph, which isn't transferring to BSBH.  You can copy and paste the following to generate a scatterplot in excel (first column x-axis, second column y-axis).

 

6 0.161505583
12 0.418903888
18 0.564630277
24 0.65338016
30 0.712471394
32 0.727976157
34 0.741908298
40 0.776329621
46 0.802688093
52 0.823508952
56 0.835115655
60 0.845293662

 

From sensitivity analysis, we can see how sensitive the calculation is to probability of not winning a cup in a single year, as increasingly divergent results are obtained when additional years are added (additional years multiplied together).

 

The next step I would take would be to allow the probability of not winning a Cup to vary (slightly) each year by introducing an error term. You can use the randbetween command in excel to generate random numbers between a specified minimum and maximum. However, I would choose to do the work in R.  This would allow the generation of an error from a normal distribution, Poisson distribution, etc.  This would also allow you (quickly, once the code is correctly written) to simulate 1,000 or 10,000 model runs.  The multiple model runs are necessary because we are no longer using a deterministic model (where a single run is sufficient as the result is the same each time) and using a stochastic model (which includes randomness).  And the more model runs you perform, the better idea you get about the variation between runs by doing statistics.  The advantage of this approach is you can vary the probability of not winning the cup (as I did for the graph) for a single year in different model runs, but keep the random error generator the same, and check the results.  You can also keep the probability of not winning the cup constant, and vary your random error term in different model runs, and check the results. This would be a sensitivity analysis of stochastic models.

 

As Pete and Keirik protest, my original probability calculation is a simplification or caricature of reality, which isn’t useful (they protest from 2 different perspectives).  I will say that the approaches that they suggest would take weeks to generate.  You would have to do a deep dive on hockey statistics. You would have to carefully think about which statistics you would use to drive your model.  And you would have to be aware of the biases that you had when you generated the model.  For example, can we definitively say that the Colorado Avalanche and the Tampa Bay Lightning were the 2 best teams in the NHL last year?  If so, how much better were they than the other teams in the league?  Were there other teams in the league that had the same (or higher!) probability of winning the Cup?  How do you account for uncertainty (random error?).   And how does your model account for differences in each year?  Are you going to do a deep dive on data for the 10 year period from 2011-2021, and generate different model parameters for each year?   Sounds like a lot of work!  I think, at least in the short term, I will stick to my spreadsheets, and 30-60 minute exercises.  But I do appreciate the discussion of posts in this thread, and contrary perspectives. I learn more from criticism that praise!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...