“Sometimes the constraints that we live with, and presume are the same for everything, are really only functions of the scale in which we operate.”
A revelatory exploration of the hottest trend in technology and the dramatic impact it will have on the economy, science, and society at large.
Which paint color is most likely to tell you that a used car is in good shape? How can officials identify the most dangerous New York City manholes before they explode? And how did Google searches predict the spread of the H1N1 flu outbreak?
The key to answering these questions, and many more, is big data. “Big data” refers to our burgeoning ability to crunch vast collections of information, analyze it instantly, and draw sometimes profoundly surprising conclusions from it. This emerging science can translate myriad phenomena — from the price of airline tickets to the text of millions of books — into searchable form, and uses our increasing computing power to unearth epiphanies that we never could have seen before. A revolution on par with the Internet or perhaps even the printing press, big data will change the way we think about business, health, politics, education, and innovation in the years to come. It also poses fresh threats, from the inevitable end of privacy as we know it to the prospect of being penalized for things we haven’t even done yet, based on big data’s ability to predict our future behavior.
In this brilliantly clear, often surprising work, two leading experts explain what big data is, how it will change our lives, and what we can do to protect ourselves from its hazards. Big Data is the first big book about the next big thing. [From: Amazon.com]
“The very idea of penalizing based on propensities is nauseating. To accuse a person of some possible future behavior is to negate the very foundation of justice: that one must have done something before we can hold him accountable for it. After all, thinking bad things is not illegal, doing them is. It is a fundamental tenet of our society that individual responsibility is tied to individual choice of action. […] Were perfect predictions possible, they would deny human volition, our ability to live our lives freely. Also, ironically, by depriving us of choice they would exculpate us from any responsibility.”
In “Big Data,” their illuminating and very timely book, Viktor Mayer-Schönberger, a professor of Internet governance and regulation at the Oxford Internet Institute at Oxford University, and Kenneth Cukier, the data editor for The Economist, argue that the nature of surveillance has changed.
“In the spirit of Google or Facebook,” they write, “the new thinking is that people are the sum of their social relationships, online interactions and connections with content. In order to fully investigate an individual, analysts need to look at the widest possible penumbra of data that surrounds the person — not just whom they know, but whom those people know too, and so on.”
Mr. Cukier and Mr. Mayer-Schönberger argue that big data analytics are revolutionizing the way we see and process the world — they even compare its consequences to those of the Gutenberg printing press. And in this volume they give readers a fascinating — and sometimes alarming — survey of big data’s growing effect on just about everything: business, government, science and medicine, privacy and even on the way we think. Notions of causality, they say, will increasingly give way to correlation as we try to make sense of patterns.
Data is growing incredibly fast — by one account, it is more than doubling every two years — and the authors of this book argue that as storage costs plummet and algorithms improve, data-crunching techniques, once available only to spy agencies, research labs and gigantic companies, are becoming increasingly democratized.
Big data has given birth to an array of new companies and has helped existing companies boost customer service and find new synergies. Before a hurricane, Walmart learned, sales of Pop-Tarts increased, along with sales of flashlights, and so stores began stocking boxes of Pop-Tarts next to the hurricane supplies “to make life easier for customers” while boosting sales. UPS, the authors report, has fitted its trucks with sensors and GPS so that it can monitor employees, optimize route itineraries and know when to perform preventive vehicle maintenance.
Baseball teams like Billy Beane’s Oakland A’s (immortalized in Michael Lewis’s best-seller “Moneyball”) have embraced new number-crunching approaches to scouting players with remarkable success. The 2012 Obama campaign used sophisticated data analysis to build a formidable political machine for identifying supporters and getting out the vote. And New York City has used data analytics to find new efficiencies in everything from disaster response, to identifying stores selling bootleg cigarettes, to steering overburdened housing inspectors directly to buildings most in need of their attention. In the years to come, Mr. Mayer-Schönberger and Mr. Cukier contend, big data will increasingly become “part of the solution to pressing global problems like addressing climate change, eradicating disease and fostering good governance and economic development.”
There is, of course, a dark side to big data, and the authors provide an astute analysis of the dangers they foresee. Privacy has become much more difficult to protect, especially with old strategies — “individual notice and consent, opting out and anonymization” — losing effectiveness or becoming completely beside the point.
“The ability to capture personal data is often built deep into the tools we use every day, from Web sites to smartphone apps,” the authors write. And given the myriad ways data can be reused, repurposed and sold to other companies, it’s often impossible for users to give informed consent to “innovative secondary uses” that haven’t even been imagined when the data was first collected.
The second danger Mr. Cukier and Mr. Mayer-Schönberger worry about sounds like a scenario from the sci-fi movie “Minority Report,” in which predictions seem so accurate that people can be arrested for crimes before they are committed. In the real near future, the authors suggest, big data analysis (instead of the clairvoyant Pre-Cogs in that movie) may bring about a situation “in which judgments of culpability are based on individualized predictions of future behavior.”
Already, insurance companies and parole boards use predictive analytics to help tabulate risk, and a growing number of places in the United States, the authors of “Big Data” say, employ “predictive policing,” crunching data “to select what streets, groups and individuals to subject to extra scrutiny, simply because an algorithm pointed to them as more likely to commit crime.”
Last week an NBC report noted that in so-called signature drone strikes “the C.I.A. doesn’t necessarily know who it is killing”: in signature strikes “intelligence officers and drone operators kill suspects based on their patterns of behavior — but without positive identification.”
One problem with relying on predictions based on probabilities of behavior, Mr. Mayer-Schönberger and Mr. Cukier argue, is that it can negate “the very idea of the presumption of innocence.”
“If we hold people responsible for predicted future acts, ones they may never commit,” they write, “we also deny that humans have a capacity for moral choice.”
At the same time, they observe, big data exacerbates “a very old problem: relying on the numbers when they are far more fallible than we think.” They point to escalation of the Vietnam War under Robert S. McNamara (who served as secretary of defense to Presidents John F. Kennedy and Lyndon B. Johnson) as a case study in “data analysis gone awry”: a fierce advocate of statistical analysis, McNamara relied on metrics like the body count to measure the progress of the war, even though it became clear that Vietnam was more a war of wills than of territory or numbers.
More recent failures of data analysis include the Wall Street crash of 2008, which was accelerated by hugely complicated trading schemes based upon mathematical algorithms. In his best-selling 2012 book,“The Signal and the Noise, ” the statistician Nate Silver, who writes the FiveThirtyEight blog for The New York Times, pointed to failures in areas like earthquake science, finance and biomedical research, arguing that “prediction in the era of Big Data” has not been “going very well” (despite his own successful forecasts in the fields of politics and baseball).
Also, as the computer scientist and musician Jaron Lanier points out in his brilliant new book, “Who Owns the Future?,” there is a huge difference between “scientific big data, like data about galaxy formation, weather or flu outbreaks,” which with lots of hard work can be gathered and mined, and “big data about people,” which, like all things human, remains protean, contradictory and often unreliable.
To their credit, Mr. Cukier and Mr. Mayer-Schönberger recognize the limitations of numbers. Though their book leaves the reader with a keen appreciation of the tools that big data can provide in helping us “quantify and understand the world,” it also warns us about falling prey to the “dictatorship of data.”
“We must guard against over reliance on data,” they write, “rather than repeat the error of Icarus, who adored his technical power of flight but used it improperly and tumbled into the sea.” [From: Nytimes.com]
“Predictions based on correlations lie at the heart of big data.”
“Big data” — we hear the term all of the time, but what does it really mean? Viktor Mayer-Schönberger and Kenneth Cukier’s 2013 bestseller, “Big Data: A Revolution That Will Transform How We Live, Work, and Think,” attempts to answer this question with a solid overview of the promises, advancements, issues and implications of the big data revolution.
The advent of big data mirrors our technological evolution as a society: for the first time in history, we have the ability to easily and cheaply capture and store massive amounts of data in a way that was simply impossible before. This transition means that we are no longer constrained to statistical methods of sampling or estimation in order to extract meaning from data.
Instead, collecting a complete data set means that we can now analyze the dataset in its entirety, as well. Simply put, analyses from here on out must focus on the subject N=all, rather than attempting to guess at a population or hope for a representative subset based on random sampling of data. “Big data” means that we can have it all.
As Mayer-Schönberger and Cukier put it, “when we talk about big data, we mean “big” less in absolute than in relative terms: relative to the comprehensive set of data.” Instead of just using bits and pieces of the data, we want to process as much of it as we can, finally seeing the forest despite the trees.
This shift in statistical measurement comes with its own set of problems. The larger a dataset, the more likely it is to have errors, and the less likely analysts are to have time to carefully clean each and every datum point. However, data scientists have found that even massive error-prone datasets are more reliable than pristine but tiny samples. In a messy dataset, the authors write, “any particular reading may be incorrect, but the aggregate of many readings will provide a more comprehensive picture.” Essentially, the messy whole can outperform exact, accurate subsets.
As we make inroads into big data, we also make an important shift from results that focus on causation to results concerned only with correlation. Mayer-Schönberger and Cukier describe it thusly:
“If millions of electronic medical records reveal that cancer sufferers who take a certain combination of aspirin and orange juice see their disease go into remission, then the exact cause for the improvement in health may be less important than the fact that they lived. Likewise, if we can save money by knowing the best time to buy a plane ticket without understanding the method behind airfare madness, that’s good enough.” — pg. 14
Nowadays, that needs to be enough — and it often is for e-commerce companies looking for profit, and doctors looking to save lives — but it also represents a radically different approach to problem-solving than many of us are used to. Rather than adhering strictly to the traditional scientific method, big data allows us to work backward, first starting with data collection, then analysis and finally drawing conclusions from whatever patterns may appear.
This shift away from trying to support or disprove a theory cancels out the possibility of researcher bias, but also lends itself to a directionless investigation, with results subject to the interests of the analysts exploring the data. Essentially, the only answers that will be found are the ones a researcher chooses to look for.
With their Kindle e-book readers, for example, Amazon.com has the ability to tabulate which sections of books are most highlighted, where readers tend to stop reading and which themes prompt the most user engagement. But since these answers don’t do anything for their long-term business goals, the data just sits there. A publishing company, however, given this same information, might use it to tweak advertising, author writing styles and marketing campaigns. In this example, both companies are using the same data, but the ‘answers’ they get from a set of data may be completely different. In the world of big data, the mindset with which researchers approach a dataset can make all of the difference.
Mayer-Schönberger and Cukier cover data ethics, collection techniques and even a shift in our natural thought processes — just some of the highlights in their book. At only 200 pages, their book is a quick read, filled with well-thought-out and easy-to-digest examples. You don’t need to be a data expert or computer science whiz to gain something from the text. In fact, the structure of the book lends itself to readers looking for a light introduction to the concept of big data.
Whether your questions are about the history of the field or where it’s headed next, Mayer-Schönberger and Cukier’s “Big Data: A Revolution That Will Transform How We Live, Work, and Think” has something for everyone. You’ll likely walk away feeling impressed, informed and most importantly, curious about the immense possibilities that lie before us with the study of big data. [From: Datascience.berkeley.edu]
“the “data scientist,” which combines the skills of the statistician, software programmer, infographics designer, and storyteller.”
Thanks to the internet, social networking, smartphones and credit cards, more data is being collected and stored about us than ever before – a level of surveillance the Stasi could only dream about, say Mayer-Schönberger and Cukier in this informative introduction to the “datafication” of our lives. Big data analysis gives big business a competitive edge (all those Amazon recommendations), but governments have invested heavily in it, too. The risks to privacy and freedom are obvious, but the authors accentuate the positive. Big data has useful applications in medicine, science and “culturomics”. Mayer-Schönberger and Cukier make interesting observations about data-crunching techniques and they also report that analysts have found substantial amounts of “lexical dark matter” (words in books but not in dictionaries). In this brave new world of big data, Google and Amazon are frontrunners – although behind the NSA and GCHQ. The next challenge may be avoiding the “dictatorship of data”. [From: Theguardian.com]
“The amount of stored information grows four times faster than the world economy, while the processing power of computers grows nine times faster. Little wonder that people complain of information overload. Everyone is whiplashed by the changes.”
Plenty of books extol the technical marvels of our information society, but this is an original analysis of the information itself—trillions of searches, calls, clicks, queries and purchases.
Mayer-Schönberger (Internet Governance and Regulation/Oxford Univ.; Delete: The Virtue of Forgetting in the Digital Age, 2009) and Economist data editor Cukier begin with a jolt by pointing out that the Centers for Disease Control and Prevention spends weeks evaluating reports from doctors and clinics before announcing a flu epidemic. In a 2009 study reported in the scientific journal Nature, Google engineers tracked certain Internet searches (“medicine for cough,” “fever”) and detected a rise in flu cases immediately. Formerly, faced with huge numbers, researchers could only examine a select sample: a slow, expensive process that led to errors if the sample wasn’t properly chosen. The Google researchers examined everything—or close to everything: hundreds of millions of searches. This was a breakthrough. “Big data,” the authors’ term for our new ability to manipulate immense amounts of information, reveals not only more, but entirely new knowledge. Who knew that by evaluating her credit card purchases, retailers can calculate the odds that a woman is pregnant? The authors provide an exciting ride without neglecting the risks. Thirty-two surveillance cameras operate within 200 yards of the apartment where George Orwell wrote 1984. Data mining is so efficient that today’s privacy protections are irrelevant. Once enough of your activities, however anonymous, are “datafied,” a computer can identify you.
A fascinating, enthusiastic view of the possibilities of vast computer correlations and the entrepreneurs who are taking advantage of them. [From: Kirkusreviews.com]
About The Authors:
Viktor Mayer-Schönberger is the Professor of Internet Governance and Regulation at Oxford. His research focuses on the role of information in a networked economy. Earlier he spent ten years on the faculty of Harvard’s Kennedy School of Government.
Professor Mayer-Schönberger has published seven books, as well as over a hundred articles (including in Science) and book chapters. His most recent book, the awards-winning ‘Delete: The Virtue of Forgetting in the Digital Age‘ (Princeton University Press 2009) has received favorable reviews by academic (Nature, Science, New Scientist) and mainstream media (New York Times, Guardian, Le Monde, NPR, BBC, Wired) and has been published in four languages. Ideas proposed in the book have now become official policy, e.g. of the European Union.
A native Austrian, Professor Mayer-Schönberger founded Ikarus Software in 1986, a company focusing on data security, and developed Virus Utilities, which became the best-selling Austrian software product. He was voted Top-5 Software Entrepreneur in Austria in 1991 and Person-of-the-Year for the State of Salzburg in 2000.
He chaired the Rueschlikon Conference on Information Policy, is the cofounder of the SubTech conference series, and served on the ABA/AAAS National Conference of Lawyers and Scientists. He is on the advisory boards of corporations and organizations around the world, including Microsoft and the World Economic Forum. He is a personal adviser to the Austrian Finance Minister on innovation policy.
He holds a number of law degrees, including one from Harvard and an MS(Econ) from the London School of Economics, and while in high school won national awards for his programming and the Physics Olympics of his home state.
In his spare time, he likes to travel, go to the movies, and learn about architecture. [From: Oxford Internet Institute]
Kenneth Cukier is the Data Editor of The Economist, following a decade on the paper as a business and technology writer, and foreign correspondent (most recently as the Tokyo correspondent in 2007-12). Previously, he was the Technology Editor of the Wall Street Journal Asia in Hong Kong, and worked at the International Herald Tribune in Paris. From 2002-04 he was a research fellow at Harvard’s Kennedy School of Government. He’s a member of the Council on Foreign Relations and serves on the board of directors of International Bridges to Justice, a nonprofit organization that promotes legal rights in developing countries [From: Amazon.com]
If you like this story, CLICK HERE to join the tribe of success-minded people just like you. You will love our weekly quick summaries of top stories, talks, books, movies, music and more with handy downloadable guides, cheat sheets, cliffs notes and quote books.
And, you can opt-out at any time – no strings, promise… CLICK HERE