Tag Archives: data science

Data storytelling in libraries

I had no idea that thee was such enthusiasm for data storytelling but it seems libraries require a kit for the topic. From an August 30, 2022 University of Illinois School of Information Sciences news release (also on EurekAlert), Note: A link has been removed,

A new project led by Associate Professor Kate McDowell and Assistant Professor Matthew Turk of the School of Information Sciences (iSchool) at the University of Illinois Urbana-Champaign will help libraries tell data stories that connect with their audiences. Their project, “Data Storytelling Toolkit for Librarians,” has received a two-year, $99,330 grant from the Institute of Museum and Library Services (IMLS grant RE-250094-OLS-21), under the Laura Bush 21st Century Librarian Program, which supports innovative research by untenured, tenure-track faculty.

“There are thousands of librarians who are skittish about data but love stories,” explained McDowell, who co-teaches a data storytelling course at the iSchool with Turk. “And there are hundreds of librarians who see data as fundamental, but until those librarians have a language through which to connect with the passions of the thousands who love stories, this movement toward strategic data use in the field of libraries will be stifled, along with the potential collaborative creativity of librarians.”

The data storytelling toolkit will provide a set of easy-to-adapt templates, which librarians can use to move quickly from data to story to storytelling. Librarians will be able to use the toolkit to plug in data they already have and generate data visualization and narrative structure options.

“To give an example, public libraries need to communicate employment impact. In this case, the data story will include who has become employed based on library services, how (journey map showing a visual sequence of steps from job seeking to employment), a structure for the story of an individual’s outcomes, and a strong data visualization strategy for communicating this impact,” said McDowell.

According to the researchers, the toolkit will be clearly defined so that librarians understand the potential for communicating with data but also fully adaptable to each librarian’s setting and to the communication needs inside the organization and with the public. The project will focus on community college and public libraries, with initial collaborators to include Ericson Public Library in Boone, Iowa; Oregon City (OR) Public Library; Moraine Valley Community College in Palos Hills, Illinois; Jackson State Community College in Jackson, Tennessee; and The Urbana Free Library.

McDowell’s storytelling research has involved training collaborations with advancement staff both at the University of Illinois Urbana-Champaign and the University of Illinois system; storytelling consulting work for multiple nonprofits including the 50th anniversary of the statewide Prairie Rivers Network that protects Illinois water; and storytelling lectures for the Consortium of Academic and Research Libraries in Illinois (CARLI). McDowell researches and publishes in the areas of storytelling at work, social justice storytelling, and what library storytelling can teach the information sciences about data storytelling. She holds both an MS and PhD in library and information science from Illinois.

Turk also holds an appointment with the Department of Astronomy in the College of Liberal Arts and Sciences at the University of Illinois. His research focuses on how individuals interact with data and how that data is processed and understood. He is a recipient of the prestigious Gordon and Betty Moore Foundation’s Moore Investigator Award in Data-Driven Discovery. Turk holds a PhD in physics from Stanford University.

I found some earlier information about a data storytelling course taught by the two researchers, from a September 25, 2019 University of Illinois School of Information Sciences news release, which provides some additional insight,

Collecting and understanding data is important, but equally important is the ability to tell meaningful stories based on data. Students in the iSchool’s Data Science Storytelling course (IS 590DST) learn data visualization as well as storytelling techniques, a combination that will prove valuable to their employers as they enter the workforce.

The course instructors, Associate Professor and Interim Associate Dean for Academic Affairs Kate McDowell and Assistant Professor Matthew Turk, introduced Data Science Storytelling in fall 2017. The course combines McDowell’s research interests in storytelling practices and applications and Turk’s research interests in data analysis and visualization.

Students in the course learn storytelling concepts, narrative theories, and performance techniques as well as how to develop stories in a collaborative workshop style. They also work with data visualization toolkits, which involves some knowledge of coding.

Ashley Hetrick (MS ’18) took Data Science Storytelling because she wanted “the skills to be able to tell the right story when the time is right for it.” She appreciated the practical approach, which allowed the students to immediately apply the skills they learned, such as developing a story structure and using a pandas DataFrame to support and build a story. Hetrick is using those skills in her current work as assistant director for research data engagement and education at the University of Illinois.

“I combine tools and methods from data science and analytics with storytelling to make sense of my unit’s data and to help researchers make sense of theirs,” she said. “In my experience, few researchers like data for its own sake. They collect, care for, and analyze data because they’re after what all storytellers are after: meaning. They want to find the signal in all of this noise. And they want others to find it too, perhaps long after their own careers are complete. Each dataset is a story and raw material for stories waiting to be told.”

According to Turk, the students who have enrolled in the course have been outstanding, “always finding ways to tell meaningful stories from data.” He hopes they leave the class with an understanding that stories permeate their lives and that shaping the stories they tell others and about others is a responsibility they carry with them.

“One reason that this course means a lot to me is because it gives students the opportunity to really bring together the different threads of study at the iSchool,” Turk said. “It’s a way to combine across levels of technicality, and it gives students permission to take a holistic approach to how they present data.”

I didn’t put much effort into it but did find three other courses on data storytelling, one at the University of Texas (my favourite), one at the University of Toronto, and one (Data Visualization and Storytelling) at the University of British Columbia. The one at the University of British Columbia is available through the business school, the other two are available through information/library science faculties.

The Storywrangler, tool exploring billions of social media messages, could predict political & financial turmoil

Being able to analyze Twitter messages (tweets) in real-time is amazing given what I wrote in this January 16, 2013 posting titled: “Researching tweets (the Twitter kind)” about the US Library of Congress and its attempts to access tweets for scholars,”

At least one of the reasons no one has received access to the tweets is that a single search of the archived (2006- 2010) tweets alone would take 24 hours, [emphases mine] …

So, bravo to the researchers at the University of Vermont (UVM). A July 16, 2021 news item on ScienceDaily makes the announcement,

For thousands of years, people looked into the night sky with their naked eyes — and told stories about the few visible stars. Then we invented telescopes. In 1840, the philosopher Thomas Carlyle claimed that “the history of the world is but the biography of great men.” Then we started posting on Twitter.

Now scientists have invented an instrument to peer deeply into the billions and billions of posts made on Twitter since 2008 — and have begun to uncover the vast galaxy of stories that they contain.

Caption: UVM scientists have invented a new tool: the Storywrangler. It visualizes the use of billions of words, hashtags and emoji posted on Twitter. In this example from the tool’s online viewer, three global events from 2020 are highlighted: the death of Iranian general Qasem Soleimani; the beginning of the COVID-19 pandemic; and the Black Lives Matter protests following the murder of George Floyd by Minneapolis police. The new research was published in the journal Science Advances. Credit: UVM

A July 15, 2021 UVM news release (also on EurekAlert but published on July 16, 2021) by Joshua Brown, which originated the news item, provides more detail abut the work,

“We call it the Storywrangler,” says Thayer Alshaabi, a doctoral student at the University of Vermont who co-led the new research. “It’s like a telescope to look — in real time — at all this data that people share on social media. We hope people will use it themselves, in the same way you might look up at the stars and ask your own questions.”

The new tool can give an unprecedented, minute-by-minute view of popularity, from rising political movements to box office flops; from the staggering success of K-pop to signals of emerging new diseases.

The story of the Storywrangler — a curation and analysis of over 150 billion tweets–and some of its key findings were published on July 16 [2021] in the journal Science Advances.

EXPRESSIONS OF THE MANY

The team of eight scientists who invented Storywrangler — from the University of Vermont, Charles River Analytics, and MassMutual Data Science [emphasis mine]– gather about ten percent of all the tweets made every day, around the globe. For each day, they break these tweets into single bits, as well as pairs and triplets, generating frequencies from more than a trillion words, hashtags, handles, symbols and emoji, like “Super Bowl,” “Black Lives Matter,” “gravitational waves,” “#metoo,” “coronavirus,” and “keto diet.”

“This is the first visualization tool that allows you to look at one-, two-, and three-word phrases, across 150 different languages [emphasis mine], from the inception of Twitter to the present,” says Jane Adams, a co-author on the new study who recently finished a three-year position as a data-visualization artist-in-residence at UVM’s Complex Systems Center.

The online tool, powered by UVM’s supercomputer at the Vermont Advanced Computing Core, provides a powerful lens for viewing and analyzing the rise and fall of words, ideas, and stories each day among people around the world. “It’s important because it shows major discourses as they’re happening,” Adams says. “It’s quantifying collective attention.” Though Twitter does not represent the whole of humanity, it is used by a very large and diverse group of people, which means that it “encodes popularity and spreading,” the scientists write, giving a novel view of discourse not just of famous people, like political figures and celebrities, but also the daily “expressions of the many,” the team notes.

In one striking test of the vast dataset on the Storywrangler, the team showed that it could be used to potentially predict political and financial turmoil. They examined the percent change in the use of the words “rebellion” and “crackdown” in various regions of the world. They found that the rise and fall of these terms was significantly associated with change in a well-established index of geopolitical risk for those same places.

WHAT’S HAPPENING?

The global story now being written on social media brings billions of voices — commenting and sharing, complaining and attacking — and, in all cases, recording — about world wars, weird cats, political movements, new music, what’s for dinner, deadly diseases, favorite soccer stars, religious hopes and dirty jokes.

“The Storywrangler gives us a data-driven way to index what regular people are talking about in everyday conversations, not just what reporters or authors have chosen; it’s not just the educated or the wealthy or cultural elites,” says applied mathematician Chris Danforth, a professor at the University of Vermont who co-led the creation of the StoryWrangler with his colleague Peter Dodds. Together, they run UVM’s Computational Story Lab.

“This is part of the evolution of science,” says Dodds, an expert on complex systems and professor in UVM’s Department of Computer Science. “This tool can enable new approaches in journalism, powerful ways to look at natural language processing, and the development of computational history.”

How much a few powerful people shape the course of events has been debated for centuries. But, certainly, if we knew what every peasant, soldier, shopkeeper, nurse, and teenager was saying during the French Revolution, we’d have a richly different set of stories about the rise and reign of Napoleon. “Here’s the deep question,” says Dodds, “what happened? Like, what actually happened?”

GLOBAL SENSOR

The UVM team, with support from the National Science Foundation [emphasis mine], is using Twitter to demonstrate how chatter on distributed social media can act as a kind of global sensor system — of what happened, how people reacted, and what might come next. But other social media streams, from Reddit to 4chan to Weibo, could, in theory, also be used to feed Storywrangler or similar devices: tracing the reaction to major news events and natural disasters; following the fame and fate of political leaders and sports stars; and opening a view of casual conversation that can provide insights into dynamics ranging from racism to employment, emerging health threats to new memes.

In the new Science Advances study, the team presents a sample from the Storywrangler’s online viewer, with three global events highlighted: the death of Iranian general Qasem Soleimani; the beginning of the COVID-19 pandemic; and the Black Lives Matter protests following the murder of George Floyd by Minneapolis police. The Storywrangler dataset records a sudden spike of tweets and retweets using the term “Soleimani” on January 3, 2020, when the United States assassinated the general; the strong rise of “coronavirus” and the virus emoji over the spring of 2020 as the disease spread; and a burst of use of the hashtag “#BlackLivesMatter” on and after May 25, 2020, the day George Floyd was murdered.

“There’s a hashtag that’s being invented while I’m talking right now,” says UVM’s Chris Danforth. “We didn’t know to look for that yesterday, but it will show up in the data and become part of the story.”

Here’s a link to and a citation for the paper,

Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and and political timelines using Twitter by Thayer Alshaabi, Jane L. Adams, Michael V. Arnold, Joshua R. Minot, David R. Dewhurst, Andrew J. Reagan, Christopher M. Danforth and Peter Sheridan Dodds. Science Advances 16 Jul 2021: Vol. 7, no. 29, eabe6534DOI: 10.1126/sciadv.abe6534 DOI: 10.1126/sciadv.abe6534

This paper is open access.

A couple of comments

I’m glad to see they are looking at phrases in many different languages. Although I do experience some hesitation when I consider the two companies involved in this research with the University of Vermont.

Charles River Analytics and MassMutual Data Science would not have been my first guess for corporate involvement but on re-examining the subhead and noting this: “potentially predict political and financial turmoil”, they make perfect sense. Charles River Analytics provides “Solutions to serve the warfighter …”, i.e., soldiers/the military, and MassMutual is an insurance company with a dedicated ‘data science space’ (from the MassMutual Explore Careers Data Science webpage),

What are some key projects that the Data Science team works on?

Data science works with stakeholders throughout the enterprise to automate or support decision making when outcomes are unknown. We help determine the prospective clients that MassMutual should market to, the risk associated with life insurance applicants, and which bonds MassMutual should invest in. [emphases mine]

Of course. The military and financial services. Delightfully, this research is at least partially (mostly?) funded on the public dime, the US National Science Foundation.

The Quantum Physicist as Causal Detective: an Oct. 7, 2020 event

I love mysteries and am quite interested in the nature of reality (you, too?) and that gives us something in common with a couple of Perimeter Institute for Theoretical Physics (PI; Canada) researchers. From The Quantum Physicist as Causal Detective event page on the insidetheperimeter.ca website (notice received via email),

In their live webcast from Perimeter on October 7 [2020], Robert Spekkens and Elie Wolfe will shed light on the exciting possibilities brought about by applying quantum thinking to the science of cause and effect.

Watch the live webcast on this page on Wednesday, October 7 [2020] at 7 pm ET.

What do data science and the foundations of quantum theory have to do with one another?

A great deal, it turns out. The particular branch of data science known as causal inference focuses on a problem which is central to disciplines ranging from epidemiology to economics: that of disentangling correlation and causation in statistical data.

Meanwhile, in a slightly different guise, this same problem has been pondered by quantum physicists as part of a continuing effort to make sense of various puzzling quantum phenomena. On top of that, the most celebrated result concerning quantum theory’s meaning for the nature of reality – Bell’s theorem – can be seen in retrospect to be built on the solution to a particularly challenging problem in causal inference.

Recent efforts to elaborate upon these connections have led to an exciting flow of techniques and insights across the disciplinary divide.

Perimeter researchers Robert Spekkens and Elie Wolfe have done pioneering work studying relations of cause and effect through a quantum foundational lens, and can be counted among a small number of physicists worldwide with expertise in this field.

In their joint webcast from Perimeter [at 7 pm ET] on October 7 [2020], Spekkens and Wolfe will explore what is happening at the intersection of these two fields and how thinking like a quantum physicist leads to new ways of sussing out cause and effect from correlation patterns in statistical data.

For those of us on the West Coast, that webcast will be at 4 p.m. on Wednesday, Oct. 7, 2020 and I believe you can watch it here.

Data science guide from Sense about Science

Sense about Science, headquartered in the UK, is in its own words (from its homepage)

Sense about Science is an independent campaigning charity that challenges the misrepresentation of science and evidence in public life. …

According to an October 1, 2019 announcement from Sense about Science (received via email), the organization has published a new guide,

Our director warned yesterday [September 30, 2019] that data science is being given a free
pass on quality in too many arenas. From flood predictions to mortgage offers to the prediction of housing needs, we are not asking enough about whether AI solutions and algorithms can bear the weight we want to put on them.

It was the UK launch of our ‘Data Science: a guide for society’ at the Institute of Physics, where we invited representatives from different sectors to take up the challenge of creating a more questioning culture. Tracey Brown said the situation was like medicine 50 years ago: it seems that some people have become too clever to explain and the rest of us are feeling too dumb to ask.

At the end of the event we had a lot of proposals for how to make different communities aware of the guide’s three fundamental questions from the people who attended. There are many hundreds of people among our friends who could do something along these lines:

     * Publicise the guide
     * Incorporate it into your own work
     * Send it to people who are involved in procurement, licensing or
reporting or decision making at community, national and international
levels
     * Undertake a project with us to equip particular groups such as
parliamentary advisers, journalists and small charities.

Would you take a look at the guide [1] here and tell me if there’s something you can do? (alex@senseaboutscience.org)

There are launches planned in other countries over the rest of this year and into 2020. We are drawing up a map of offers to reach different communities. I’ll share all your suggestions with my colleague Errin Riley at the end of this week and we will get back to you quickly.

Before linking you to the guide, here’s a brief description from the Patterns in Data webpage,

In recent years, phrases like ‘big data’, ‘machine learning’, ‘algorithms’ and ‘pattern recognition’ have started slipping into everyday discussion. We’ve worked with researchers and experts to generate an open and informed public discussion on patterns in data across a wide range of projects.

Data Science: A guide for society

According to the headlines, we’re in the middle of a ‘data revolution: large, detailed datasets and complex algorithms allow us to make predictions on anything from who will win the league to who is likely to commit a crime. Our ability to question the quality of evidence – as the public, journalists, politicians or decision makers – needs to be expanded to meet this. To know the questions to ask and how to press for clarity about the strengths and weaknesses of using analysis from data models to make decisions. This is a guide to having more of those conversations, regardless of how much you don’t know about data science.

Here’s Data Science: A Guide for Society.