A view of a baseball during batting practice before the Texas Rangers hosted the Chicago Cubs on Opening Day at Globe Life Park in Arlington on March 28. (Tom Pennington/Getty Images) |
A Sense of Doubt blog post #1602 - Baseball's Big Data is Over?
It's always weird when Baseball takes its midsummer break for the All Star Game. In past years, I have been restless and uneasy, missing something vital to my life. I know that statement will get me ridiculed by sport haters and even by those who compare Baseball to watching paint dry. But it's not the sport and its spectacle that I miss, its the cadence, the rhythms, the routine, the quiet, the lull, the steps, the poetry of Baseball. For me, Baseball is a meditation. It's an ambient experience. It's a comfort, evoking the feelings of nostalgia, family, hearth, home, love, summer, vacation, recreation, and a host of other happy and tranquil thoughts and feelings.
Baseball is serenity.
And so, usually, when I am without it, I feel a bit lost, like someone losing his religion (thank you, REM). I recover well enough in the off-season because there's basketball and football, and we must all have the long winter rest before spring's rebirth. I can wait. I am patient. But usually at the midpoint of the Baseball season with a break of nearly a week, I feel disconnected and a bit disconsolate.
This year is different. Despite some tough times in my work life, I am centered, positive, courageous, and determined. At times, I have even convinced myself that I am calm.
I am okay with the time without Baseball, but I do want it back.
And so, from department of how material always presents itself, this article to share from THE WASHINGTON POST.
I became a Baseball stat head in the mid-1990s when I discovered the Detroit Tigers fan listserv on the Internet. Through the geeky fans of the Tigers who liked to correspond about their favorite team on a daily basis, I was introduced to Bill James and more complicated metrics than those on the back of my Baseball cards. Things like OBP, SLG, OPS, WHIP, and such are widely accepted now, but I am also pleased every time I hear Baseball broadcasters, like Dan Dickerson of the Tigers, mention WAR -- Wins Above Repclacement level -- in terms of how we evaluate players and their seasonal performances.
Since then, Baseball data analysis has been championed and popularized by books like Moneyball and a shift from the conventional "gut instinct" and "measure of experience" means of making Baseball decisions to computer-modeled, data-driven, statistical analysis of every pitch, the location of every ball put in play, the swing of every batter, and more.
Since I was a young boy first learning about Baseball, I had been enamored of its rich history which mirrored the history of America as well as the depth of the statistics available on current players and those throughout the game's history. My affection for understanding the game at least in part through its numbers only grew when I discovered Bill James and a cadre of follow up statisticians who analyzed the game.
And so, this article caught my interest. The headline is misleading. Ultimately, Phillips is not arguing that we will give up use of data in analyzing Baseball, but that how we use data and the questions we ask will change.
Well, duh.
This has already proven to be true examining Moneyball and how some of its main contentions have changed. In Moneyball, Michael Lewis explained how the Oakland Athletics constructed competitive teams on the cheap because of an understanding of the importance of On Base Percentage over defense, pitching over defense, and a few other factors. In regards to the first, analyzing a players plate patience and rate of reaching base versus defensive performance versus also the player's salary and other factors helped the As choose lesser know players, rookies, and others to fill its roster over high-priced and brand-name free agents whose skills in these lesser regarded categories were not as pronounced as in other more well-known categories. It became an equation of runs produced versus runs allowed. If a cheap player reaches base 20% more often than an expensive player but makes double the errors, how does this difference affect the number of wins a team has? Furthermore, if a player's defensive liability is further mitigated by pitching, then wins may be even less affected. For instance, if the right fielder makes more errors as well as fails to reach some balls in play than one might like but reaches base more often than a similar expensive, gold-glover, but the pitchers induce more ground balls, then problem solved. There's fewer chances for that player to fail. It's all about avoiding outs for your team and inducing outs for the other team.
But since Moneyball, Baseball has already shifted its focus as it has learned the value of a defensive shift in maximizing outs against less versatile batters and how run-costing defense, range to make plays, and other newly analyzed factors impact the game significantly and cannot be minimized as much as previously believed.
Heck, some teams have started using high impact relievers to start games to minimize early game run scoring by opponents and use a dominant bullpen to carry a game the same way a dominant starter may.
Okay, enough. This article shares some good history and a short review of some of the central concepts in use of big data in Baseball, but it hardly calls for abandoning the data analysis but rather reminds us that "what's hot" today may not be hot tomorrow.
Thing to know about me: I see Baseball as a religion, and so I capitalize it.
FROM - https://www.washingtonpost.com/outlook/2019/07/09/if-baseball-is-any-indication-big-data-revolution-is-over/?utm_term=.e64aff81da5e
Made by History Perspective
If
baseball is any indication, the big data revolution is over
The data revolution has largely disappointed — and we shouldn’t
be surprised
The big data revolution may soon be over.
Companies and governments will still continue to collect data, of course, and computing power will continue to grow. But vastly larger sets of data, even if collected more quickly and effectively, won’t answer all our questions or solve our problems as they were once promised to do. This failure shouldn’t surprise us, however, and we can see why by looking at one of the most heralded venues for data analysis: baseball.
Baseball has a long history of technological innovation, promising that the collection of more data, or the right kind of data, would transform the game. But at each moment, the unquenchable thirst for more and better statistics shows how data revolutions actually generate new questions more often than they do solutions, and how those questions, in turn, generate the need for even more data.
In the 1860s, pioneering reporter Henry Chadwick promoted his new scoresheets and scoring system as essential technologies for collecting data. Though teams were already tracking runs and outs, such meager data, he claimed, was insufficient for a true analysis of the game. Only by tracking and systematically recording the “character” of each play — a “good” catch, a “clean” hit, an “earned” run — could there be enough information for accurate judgments to be made.
But Chadwick’s methods proved inadequate because they depended on a faulty system of individual informants tracking scores. In the early years of the 20th century, therefore, National League secretary (and later president) John Heydler created a system of ledgers for each team and player, enabling tracking of both season-wide and career-wide statistics. By centrally managing the collection of data rather than delegating it to newspapers or private interests, he would be able to create “comprehensive” records far superior to existing data operations, one admirer noted.
Heydler and his colleagues promised the new official repository would solve the problems of inconsistent and disjointed statistical records scattered around the country, allowing fans and management to trust the statistics. The quality of an individual player, or the worth of a team, could now reliably be ascertained with a single glance at the tables.
But it was not enough, at least not according to engineer Pete Palmer, who worked in the 1960s and 1970s to create the first database of baseball statistics. By translating the various official and unofficial records into punched cards, and then processing them on then-novel electronic supercomputers, Palmer boasted he could use the computer to be sure the year-end and career totals added up correctly. If there was a discrepancy between a team’s totals and those of its combined players, for example, the computer could reveal the problem.
By using the latest technology to process existing data — one of Palmer’s employers, Systems Development Corporation, was a pioneer of the “database” — Palmer now claimed the revolutionary ability to electronically access statistical data. He wanted to use computers to check the quality of data across dozens of seasons and thousands of players. By thinking in terms of a database rather than an inert list of facts, Palmer emphasized the importance of data recall and organization, and Palmer’s own database would become the core of today’s most definitive source of statistics, Baseball Reference.
But within a decade, these records were already irredeemably insufficient. In an era of free agency and exploding salaries, teams needed more precise information and better capacity to analyze it. Palmer’s database had been built on the official statistical summaries collected by the National and American Leagues, but the leagues had never systematically collected any play-by-play data. Because such information was not publicly available, even basic questions — such as whether one player or strategy succeeded more often in certain situations than another — couldn’t be answered.
In response, Bill James’s Project Scoresheet launched in 1984, combining a new form of the scoresheet with a network of scorers watching every game across the country, and then deploying the latest technology — personal computers — to collect and publish the data. Again a revolution was proclaimed, now in the ability to use data to analyze in-game strategy and individual player contributions to figure out if players justified their salaries, or why one group of players was more successful than another. What contributions, in other words, produced success?
James’s methods and data did produce serious changes in the game, highlighting the importance of on-base percentage and defensive contributions and challenging the inefficient use of relief pitchers. Project Scoresheet itself eventually furnished crucial technology and infrastructure through which the for-profit company STATS became the leading purveyor of daily baseball data for media outlets in the 1990s.
Yet even instantaneous play-by-play information would prove insufficient. In 2014, MLB unveiled Statcast, its own radar- and video-based data collection system, in which the position and movement of every player and ball could be recorded and analyzed. It was baseball’s first taste of truly “big data,” and proponents quickly heralded the fact that the amount of data Statcast provided in its first full season was far larger than the total amount collected throughout the history of the game. Following the so-called moneyball revolution, which emphasized the value of statistical analysis for player acquisition and strategy, Statcast seemed to promise that there was finally enough data to fundamentally answer the game’s remaining questions.
But will it? History says no. Over the past century and a half, data revolutions have helped fans and managers better understand the game, but each was also deemed woefully inadequate within years of debuting.
The same is likely to happen to Statcast — and sooner than we might think. That’s not because some fatal technological flaw will emerge, but rather because of the nature of data itself. Data are, in essence, the things we rely on to make arguments and answer questions. And the questions we ask inevitably change over time.
There’s no doubt Statcast and similar data-collection efforts have changed the game from the days when runs batted in and earned run average dominated our understanding of players. Right now, the latest trends in baseball are defensive shifts, launch angles, exit velocity and “true” outcomes. But soon we’ll ask new questions for which new data will be required. It’s not that we haven’t learned anything, but rather that we’ll never learn everything.
Some of the most important questions today are about integrating qualitative and quantitative data: playing statistics may be useful for valuing current major leaguers, but they’re nearly useless for assessing the future value of amateurs who play against far inferior competition and whose ability may change dramatically over time. Teams have to learn to use data like exit velocity and launch angle to augment the judgments of scouts and make the inexact science of player evaluation a bit more precise.
The next baseball data revolution will also come in part because the data currently being collected won’t just capture the game as it is, but also will change the sport. If teams realize that speedy players are undervalued, for example, or that defensive shifts work, they adjust, and those changes will in turn affect what data is collected and what it means.
Similar stories could be told about the role of data in scores of other areas, from medical decision-making to political campaigns. In each case, developments that seemed to fundamentally transform the field through novel technologies of data collection were soon written off as old news.
When the concept of “big data” first emerged two decades ago, the promise was clear: Computing power had gotten so fast, storage so cheap and statistical tools so powerful that computers could offer up new kinds of analysis that could guide society to a better place. Now, with some historical perspective, we can begin to see how the rise of big data in baseball and beyond merely echoes the excitement surrounding previous developments.
For centuries data-driven reformers have repeatedly claimed some new technology will finally collect enough data to solve our pressing problems, but it hasn’t come to be. The era of big data isn’t over so much as its differences with previous epochs seem far less salient. Big data will always be with us, but arguably it also always has.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
- Bloggery committed by chris tower - 1907.10 - 10:10
- Days ago = 1467 days ago
- New note - On 1807.06, I ceased daily transmission of my Hey Mom feature after three years of daily conversations. I plan to continue Hey Mom posts at least twice per week but will continue to post the days since ("Days Ago") count on my blog each day. The blog entry numbering in the title has changed to reflect total Sense of Doubt posts since I began the blog on 0705.04, which include Hey Mom posts, Daily Bowie posts, and Sense of Doubt posts. Hey Mom posts will still be numbered sequentially. New Hey Mom posts will use the same format as all the other Hey Mom posts; all other posts will feature this format seen here.
No comments:
Post a Comment