Tag Archives: Imagining the Internet Center

Sporty data science

Sarah Kessler’s July 20, 2012 article for Fast Company about big data and the latest Pew Research Center‘s survey notes some of the concerns and hopes,

Despite the usefulness of all of this information [big data], however, the idea of collecting more and more from consumers strikes a creepy chord for many survey respondents. Some argued that the benefits of big data would be companies, not individuals.

“The world is too complicated to be usefully encompassed in such an undifferentiated Big Idea,” wrote John Pike, the director of GlobalSecurity.org. “Whose ‘Big Data’ are we talking about? Wall Street, Google, the NSA? I am small, so generally I do not like Big.”

There is value to be found in this data, value in our newfound publicness,” argued Jeff Jarvis, the author of What Would Google Do?. “Google’s founders have urged government regulators not to require them to quickly delete searches because, in their patterns and anomalies, they have found the ability to track the outbreak of the flu before health officials could and they believe that by similarly tracking a pandemic, millions of lives could be saved.”

The July 20, 2102 press release from the Pew Research Center provides more detail about the study,

A new Pew Internet/Elon University survey of 1,021 Internet experts, observers and stakeholders measured current opinions about the potential impact of human and machine analysis of newly emerging large data sets in the years ahead. The survey is an opt-in, online canvassing. Some 53% of those surveyed predicted that the rise of Big Data is likely be “a huge positive for society in nearly all respects” by the year 2020. Some 39% of survey participants said it is likely to be “a big negative.”

“The analysts who expect we will see a mostly positive future say collection and analysis of Big Data will improve our understanding of ourselves and the world,” said researcher Lee Rainie, director of the Pew Research Center’s Internet & American Life Project. “They predict that the continuing development of real-time data analysis and enhanced pattern recognition could bring revolutionary change to personal life, to the business world and to government.”

As with all technological evolution, the experts also anticipate some negative outcomes. “The experts responding to this survey noted that the people controlling the resources to collect, manage and sort large data sets are generally governments or corporations with their own agendas to meet,” said Janna Anderson, director of Elon’s Imagining the Internet Center and a co-author of the study. “They also say there’s a glut of data and a shortage of human curators with the tools to sort it well, there are too many variables to be considered, the data can be manipulated or misread, and much of it is proprietary and unlikely to be shared.”

Here’s how these stakeholders, critics, and experts responded to two of the questions (from the news release),

53% agreed with the statement:

Thanks to many changes, including the building of “the Internet of Things,” human and machine analysis of large data sets will improve social, political, and economic intelligence by 2020. The rise of what is known as “Big Data” will facilitate things like  “nowcasting” (real-time “forecasting” of events); the development of “inferential software” that assesses data patterns to project outcomes; and the creation of algorithms for advanced correlations that enable new understanding of the world. Overall, the rise of Big Data is a huge positive for society in nearly all respects.

39% agreed with the alternate statement, which posited:

Thanks to many changes, including the building of “the Internet of Things,” human and machine analysis of Big Data will cause more problems than it solves by 2020. The existence of huge data sets for analysis will engender false confidence in our predictive powers and will lead many to make significant and hurtful mistakes. Moreover, analysis of Big Data will be misused by powerful people and institutions with selfish agendas who manipulate findings to make the case for what they want. And the advent of Big Data has a harmful impact because it serves the majority (at times inaccurately) while diminishing the minority and ignoring important outliers. Overall, the rise of Big Data is a big negative for society in nearly all respects.

Note:   A total of 8% did not respond.

Kessler did mention Kaggle, a website which hosts data science competitions or, as they prefer, sporting events. From the About Kaggle page,

Kaggle is an innovative solution for statistical/analytics outsourcing. We are the leading platform for predictive modeling competitions. Companies, governments and researchers present datasets and problems – the world’s best data scientists then compete to produce the best solutions. At the end of a competition, the competition host pays prize money in exchange for the intellectual property behind the winning model.

Interesting, oui? Give your work over for a prize of how much? and to whom? The music company EMI? Facebook? These companies have had or currently have competitions on Kaggle.

I can understand research scientists taking this route since they don’t usually have the financial wherewithal to pay for crunching giant data sets and, presumably, their work is to benefit the planet  rather than a few executives and major stockholders. Maybe it’s worth it? EMI will pay $10,000 for the winning model.