Fantasy Baseball, Lessons Learned and Applied: Intro and Draft Alignment

1-2-3-4-5-6-7-8-9-10-11-12 years.

Twelve years is a long time.  It represents a big chunk of my life and a bigger chunk of my career.  It also represents the amount of time that I’ve been playing Fantasy Baseball with the same group of people in the same league with (generally) the same rules.

This is the first post in a series leading up to and including our 13th season where I am going to describe and analyze an approach to Fantasy Baseball that is informed by my work in Digital Analytics and vice versa.  I hope to cover a lot of topics, from metrics alignment to technology to statistical techniques to maintaining focus across a long campaign.


Why Fantasy Baseball?

Given that my work involves collecting, interpreting, and presenting complex data from multiple sources, it shouldn’t be too surprising (or maybe it should) that one of my favorite hobbies is one that lends itself to being awash in data.  The sport of Baseball is an incredible data generation machine, with hundreds of players compiling statistics across literally thousands of games each season.  Sixty or so years ago, a group of Fantasy Baseball hobbyists invented “Sabermetrics“, a data-driven approach to understanding baseball outcomes, which has led to massive changes in how the professional game is viewed by both fans and team management.  The Sabermetrics movement, combined with the internet, has resulted in an explosion of the data and analysis that is available to fans of the sport.

My fantasy baseball league, MAK, is a 6×6 Rotisserie league with 12 teams (read this Wikipedia entry for a primer).  Points are awarded based on statistic ranking relative to the other participants in the league, with first place in any given category given 12 points, second place given 11, and so on.  These stats accumulate over the course of the entire season.

The other players in my league all play differently, but they all play to win.  You’d think that my skills with data would allow me to win every year without much effort, but Fantasy Baseball, like Digital Analytics is hard.  Really hard.  Baseball is a really complex system.  I’ve won once, in 2012.  The complexity and unpredictability of the sport add to the fun of Fantasy.

I work and play in complex systems and attempting to understand and “solve” them gives me a great deal of joy.

Let’s Get Aligned

One key to a successful season is a successful draft.  I will spend quite a bit of time here discussing what I call “draft optimization”, which I define as “making the best possible decisions in the draft to maximize your chances at winning the season.”

The first step to draft optimization is to align on objectives.  The categories that we are scored on follow:
Batting: Batting Average, Walks over Strikeouts, Home Runs, Runs Batted In, Runs Scored, and Stolen Bases.
Pitching: Earned Run Average, WHIP, Strikeouts Pitched, Wins minus Losses, Holds, and Saves.

With the inclusion of Walks/Strikeouts, Wins-Losses, and Holds, the MAK league subscribes to a non-standard scoring scheme.

Fantasy sports, even excluding the “Daily Fantasy” gambling aspect, is a big industry.  That industry includes a massive amount of fantasy player rankings, projections, predictions, analysis, and advice available across a variety of media (web, radio, TV, magazines, mobile apps).  Most other Rotisserie leagues are based on standardized 4×4 or 5×5 scoring systems, which means that most industry-provided content is not aligned to our scoring system.  Moreover, it has been my experience that player stat projections provided by these sources are not always accurate – they tend to overrate the elite batters and all pitchers in an effort to generate excitement for the season (and exposure for their products).

So how do I deal with all of this inaccurate analysis and guidance?  I ignore it and roll my own.

What works for everyone else might not work for you.  Understanding “the rules” of your business and aligning to your objectives for your unique practice will help you develop people and process and select and use tools that will drive success.

Valuation (Not Reinventing the Wheel)

Nate Silver, hero to data nerds everywhere, spends an entire chapter in his book “The Signal and the Noise: Why So Many Predictions Fail – but Some Don’t” discussing baseball statistics.  He describes his hobby-turned-obsession of collecting and modeling baseball player statistics and developing a predictive model for future performance.  The chapter’s purpose was to introduce the concept of overfitting – his models were too complex to be accurate and were outperformed by a simple model based on player age.  When I first read the book some years ago, I remember shaking my head and thinking “Wow, what a crazy waste of time it was for him to try to predict future stats”.

Why was it a waste of time?  There are already lots of people that create player predictions.  Some of those people are very strongly incentivized to try to be as accurate as possible.  Not only that, but there are services that aggregate the predictions of the best predictors.

Why would I be interested in using an aggregated data source?  I’ve got an entire set of projections from our fantasy league platform.  However, I draw on the “magic” in the “Wisdom of the Crowd“.  I’ve found that for any given player’s projected performance statistics, the group will, on average, outperform my league’s single projection source.  In line with the theory (read it for yourself in Surowiecki’s “The Wisdom of Crowds“), I believe that the group possesses a diversity of opinion, independence from others’ opinions, they are “decentralized” – they are drawing on their own specialized knowledge and experience in weighing player performance factors, and their opinions are aggregated by a central source, FantasyPros.

Keep a critical eye on your data sources.  Vendor-provided information is typically self-serving.

What’s Next

So we have a background, our key objectives going into my draft, and my data sources defined.  Next up, I’m going to describe performance benchmarking and delve into some technical analysis (with R!).