Tag Archives: Twitter

The Storywrangler, tool exploring billions of social media messages, could predict political & financial turmoil

Being able to analyze Twitter messages (tweets) in real-time is amazing given what I wrote in this January 16, 2013 posting titled: “Researching tweets (the Twitter kind)” about the US Library of Congress and its attempts to access tweets for scholars,”

At least one of the reasons no one has received access to the tweets is that a single search of the archived (2006- 2010) tweets alone would take 24 hours, [emphases mine] …

So, bravo to the researchers at the University of Vermont (UVM). A July 16, 2021 news item on ScienceDaily makes the announcement,

For thousands of years, people looked into the night sky with their naked eyes — and told stories about the few visible stars. Then we invented telescopes. In 1840, the philosopher Thomas Carlyle claimed that “the history of the world is but the biography of great men.” Then we started posting on Twitter.

Now scientists have invented an instrument to peer deeply into the billions and billions of posts made on Twitter since 2008 — and have begun to uncover the vast galaxy of stories that they contain.

Caption: UVM scientists have invented a new tool: the Storywrangler. It visualizes the use of billions of words, hashtags and emoji posted on Twitter. In this example from the tool’s online viewer, three global events from 2020 are highlighted: the death of Iranian general Qasem Soleimani; the beginning of the COVID-19 pandemic; and the Black Lives Matter protests following the murder of George Floyd by Minneapolis police. The new research was published in the journal Science Advances. Credit: UVM

A July 15, 2021 UVM news release (also on EurekAlert but published on July 16, 2021) by Joshua Brown, which originated the news item, provides more detail abut the work,

“We call it the Storywrangler,” says Thayer Alshaabi, a doctoral student at the University of Vermont who co-led the new research. “It’s like a telescope to look — in real time — at all this data that people share on social media. We hope people will use it themselves, in the same way you might look up at the stars and ask your own questions.”

The new tool can give an unprecedented, minute-by-minute view of popularity, from rising political movements to box office flops; from the staggering success of K-pop to signals of emerging new diseases.

The story of the Storywrangler — a curation and analysis of over 150 billion tweets–and some of its key findings were published on July 16 [2021] in the journal Science Advances.

EXPRESSIONS OF THE MANY

The team of eight scientists who invented Storywrangler — from the University of Vermont, Charles River Analytics, and MassMutual Data Science [emphasis mine]– gather about ten percent of all the tweets made every day, around the globe. For each day, they break these tweets into single bits, as well as pairs and triplets, generating frequencies from more than a trillion words, hashtags, handles, symbols and emoji, like “Super Bowl,” “Black Lives Matter,” “gravitational waves,” “#metoo,” “coronavirus,” and “keto diet.”

“This is the first visualization tool that allows you to look at one-, two-, and three-word phrases, across 150 different languages [emphasis mine], from the inception of Twitter to the present,” says Jane Adams, a co-author on the new study who recently finished a three-year position as a data-visualization artist-in-residence at UVM’s Complex Systems Center.

The online tool, powered by UVM’s supercomputer at the Vermont Advanced Computing Core, provides a powerful lens for viewing and analyzing the rise and fall of words, ideas, and stories each day among people around the world. “It’s important because it shows major discourses as they’re happening,” Adams says. “It’s quantifying collective attention.” Though Twitter does not represent the whole of humanity, it is used by a very large and diverse group of people, which means that it “encodes popularity and spreading,” the scientists write, giving a novel view of discourse not just of famous people, like political figures and celebrities, but also the daily “expressions of the many,” the team notes.

In one striking test of the vast dataset on the Storywrangler, the team showed that it could be used to potentially predict political and financial turmoil. They examined the percent change in the use of the words “rebellion” and “crackdown” in various regions of the world. They found that the rise and fall of these terms was significantly associated with change in a well-established index of geopolitical risk for those same places.

WHAT’S HAPPENING?

The global story now being written on social media brings billions of voices — commenting and sharing, complaining and attacking — and, in all cases, recording — about world wars, weird cats, political movements, new music, what’s for dinner, deadly diseases, favorite soccer stars, religious hopes and dirty jokes.

“The Storywrangler gives us a data-driven way to index what regular people are talking about in everyday conversations, not just what reporters or authors have chosen; it’s not just the educated or the wealthy or cultural elites,” says applied mathematician Chris Danforth, a professor at the University of Vermont who co-led the creation of the StoryWrangler with his colleague Peter Dodds. Together, they run UVM’s Computational Story Lab.

“This is part of the evolution of science,” says Dodds, an expert on complex systems and professor in UVM’s Department of Computer Science. “This tool can enable new approaches in journalism, powerful ways to look at natural language processing, and the development of computational history.”

How much a few powerful people shape the course of events has been debated for centuries. But, certainly, if we knew what every peasant, soldier, shopkeeper, nurse, and teenager was saying during the French Revolution, we’d have a richly different set of stories about the rise and reign of Napoleon. “Here’s the deep question,” says Dodds, “what happened? Like, what actually happened?”

GLOBAL SENSOR

The UVM team, with support from the National Science Foundation [emphasis mine], is using Twitter to demonstrate how chatter on distributed social media can act as a kind of global sensor system — of what happened, how people reacted, and what might come next. But other social media streams, from Reddit to 4chan to Weibo, could, in theory, also be used to feed Storywrangler or similar devices: tracing the reaction to major news events and natural disasters; following the fame and fate of political leaders and sports stars; and opening a view of casual conversation that can provide insights into dynamics ranging from racism to employment, emerging health threats to new memes.

In the new Science Advances study, the team presents a sample from the Storywrangler’s online viewer, with three global events highlighted: the death of Iranian general Qasem Soleimani; the beginning of the COVID-19 pandemic; and the Black Lives Matter protests following the murder of George Floyd by Minneapolis police. The Storywrangler dataset records a sudden spike of tweets and retweets using the term “Soleimani” on January 3, 2020, when the United States assassinated the general; the strong rise of “coronavirus” and the virus emoji over the spring of 2020 as the disease spread; and a burst of use of the hashtag “#BlackLivesMatter” on and after May 25, 2020, the day George Floyd was murdered.

“There’s a hashtag that’s being invented while I’m talking right now,” says UVM’s Chris Danforth. “We didn’t know to look for that yesterday, but it will show up in the data and become part of the story.”

Here’s a link to and a citation for the paper,

Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and and political timelines using Twitter by Thayer Alshaabi, Jane L. Adams, Michael V. Arnold, Joshua R. Minot, David R. Dewhurst, Andrew J. Reagan, Christopher M. Danforth and Peter Sheridan Dodds. Science Advances 16 Jul 2021: Vol. 7, no. 29, eabe6534DOI: 10.1126/sciadv.abe6534 DOI: 10.1126/sciadv.abe6534

This paper is open access.

A couple of comments

I’m glad to see they are looking at phrases in many different languages. Although I do experience some hesitation when I consider the two companies involved in this research with the University of Vermont.

Charles River Analytics and MassMutual Data Science would not have been my first guess for corporate involvement but on re-examining the subhead and noting this: “potentially predict political and financial turmoil”, they make perfect sense. Charles River Analytics provides “Solutions to serve the warfighter …”, i.e., soldiers/the military, and MassMutual is an insurance company with a dedicated ‘data science space’ (from the MassMutual Explore Careers Data Science webpage),

What are some key projects that the Data Science team works on?

Data science works with stakeholders throughout the enterprise to automate or support decision making when outcomes are unknown. We help determine the prospective clients that MassMutual should market to, the risk associated with life insurance applicants, and which bonds MassMutual should invest in. [emphases mine]

Of course. The military and financial services. Delightfully, this research is at least partially (mostly?) funded on the public dime, the US National Science Foundation.

Scientists, outreach and Twitter research plus some tips from a tweeting scientist

I have two bits today and both concern science and Twitter.

Twitter science research

A doodle by Isabelle Côté to illustrate her recent study on the effectiveness of scientists using Twitter to share their research with the public. Credit: Isabelle Côté

I was quite curious about this research on scientists and their Twitter audiences coming from Simon Fraser University (SFU; Vancouver, Canada). From a July 11, 2018 SFU news release (also on EurekAlert),

Isabelle Côté is an SFU professor of marine ecology and conservation and an active science communicator whose prime social media platform is Twitter.

Côté, who has cultivated more than 5,800 followers since she began tweeting in 2012, recently became curious about who her followers are.

“I wanted to know if my followers are mainly scientists or non-scientists – in other words was I preaching to the choir or singing from the rooftops?” she says.

Côté and collaborator Emily Darling set out to find the answer by analyzing the active Twitter accounts of more than 100 ecology and evolutionary biology faculty members at 85 institutions across 11 countries.

Their methodology included categorizing followers as either “inreach” if they were academics, scientists and conservation agencies and donors; or “outreach” if they were science educators, journalists, the general public, politicians and government agencies.

Côté found that scientists with fewer than 1,000 followers primarily reach other scientists. However, scientists with more than 1,000 followers have more types of followers, including those in the “outreach” category.

Twitter and other forms of social media provide scientists with a potential way to share their research with the general public and, importantly, decision- and policy-makers. Côté says public pressure can be a pathway to drive change at a higher level. However, she notes that while social media is an asset, it is “not likely an effective replacement for the more direct science-to-policy outreach that many scientists are now engaging in, such as testifying in front of special governmental committees, directly contacting decision-makers, etc.”

Further, even with greater diversity and reach of followers, the authors concede there are still no guarantees that Twitter messages will be read or understood. Côté cites evidence that people selectively read what fits with their perception of the world, that changing followers’ minds about deeply held beliefs is challenging.

“While Twitter is emerging as a medium of choice for scientists, studies have shown that less than 40 per cent of academic scientists use the platform,” says Côté.

“There’s clearly a lot of room for scientists to build a social media presence and increase their scientific outreach. Our results provide scientists with clear evidence that social media can be used as a first step to disseminate scientific messages well beyond the ivory tower.”

Here’s a link to and a citation for the paper (my thoughts on the matter are after),

Scientists on Twitter: Preaching to the choir or singing from the rooftops? by Isabelle M. Côté and Emily S. Darling. Facets DOI: https://doi.org/10.1139/facets-2018-0002 Published Online 28 June 2018

This paper is in an open access journal.

Thoughts on the research

Neither of the researchers, Côté and Darling, appears to have any social science training; so where I’d ordinarily laud the researchers for their good work, I have to include extra kudos for taking on a type of research outside their usual domain of expertise.

If this sort of thing interests you and you have the time, I definitely recommend reading the paper (from the paper‘s introduction), Note: Links have been removed)

Communication has always been an integral part of the scientific endeavour. In Victorian times, for example, prominent scientists such as Thomas H. Huxley and Louis Agassiz delivered public lectures that were printed, often verbatim, in newspapers and magazines (Weigold 2001), and Charles Darwin wrote his seminal book “On the origin of species” for a popular, non-specialist audience (Desmond and Moore 1991). In modern times, the pace of science communication has become immensely faster, information is conveyed in smaller units, and the modes of delivery are far more numerous. These three trends have culminated in the use of social media by scientists to share their research in accessible and relevant ways to potential audiences beyond their peers. The emphasis on accessibility and relevance aligns with calls for scientists to abandon jargon and to frame and share their science, especially in a “post-truth” world that can emphasize emotion over factual information (Nisbet and Mooney 2007; Bubela et al. 2009; Wilcox 2012; Lubchenco 2017).

The microblogging platform Twitter is emerging as a medium of choice for scientists (Collins et al. 2016), although it is still used by a minority (<40%) of academic faculty (Bart 2009; Noorden 2014). Twitter allows users to post short messages (originally up to 140 characters, increased to 280 characters since November 2017) that can be read by any other user. Users can elect to follow other users whose posts they are interested in, in which case they automatically see their followees’ tweets; conversely, users can be followed by other users, in which case their tweets can be seen by their followers. No permission is needed to follow a user, and reciprocation of following is not mandatory. Tweets can be categorized (with hashtags), repeated (retweeted), and shared via other social media platforms, which can exponentially amplify their spread and can offer links to websites, blogs, or scientific papers (Shiffman 2012).

There are scientific advantages to using digital communication technologies such as Twitter. Scientific users describe it as a means to stay abreast of new scientific literature, grant opportunities, and science policy, to promote their own published papers and exchange ideas, and to participate in conferences they cannot attend in person as “virtual delegates” (Bonetta 2009; Bik and Goldstein 2013; Parsons et al. 2014; Bombaci et al. 2016). Twitter can play a role in most parts of the life cycle of a scientific publication, from making connections with potential collaborators, to collecting data or finding data sources, to dissemination of the finished product (Darling et al. 2013; Choo et al. 2015). There are also some quantifiable benefits for scientists using social media. For example, papers that are tweeted about more often also accumulate more citations (Eysenbach 2011; Thelwall et al. 2013; Peoples et al. 2016), and the volume of tweets in the first week following publication correlates with the likelihood of a paper becoming highly cited (Eysenbach 2011), although such relationships are not always present (e.g., Haustein et al. 2014).

In addition to any academic benefits, scientists might adopt social media, and Twitter in particular, because of the potential to increase the reach of scientific messages and direct engagement with non-scientific audiences (Choo et al. 2015). This potential comes from the fact that Twitter leverages the power of weak ties, defined as low-investment social interactions that are not based on personal relationships (Granovetter 1973). On Twitter, follower–followee relationships are weak: users generally do not personally know the people they follow or the people who follow them, as their interactions are based mainly on message content. Nevertheless, by retweeting and sharing messages, weak ties can act as bridges across social, geographic, or cultural groups and contribute to a wide and rapid spread of information (Zhao et al. 2010; Ugander et al. 2012). The extent to which the messages of tweeting scientists benefit from the power of weak ties is unknown. Does Twitter provide a platform that allows scientists to simply promote their findings to other scientists within the ivory tower (i.e., “inreach”), or are tweeting scientists truly exploiting social media to potentially reach new audiences (“outreach”) (Bik et al. 2015; McClain and Neeley 2015; Fig. 1)?

Fig. 1. Conceptual depiction of inreach and outreach for Twitter communication by academic faculty. Left: If Twitter functions as an inreach tool, tweeting scientists might primarily reach only other scientists and perhaps, over time (arrow), some applied conservation and management science organizations. Right: If Twitter functions as an outreach tool, tweeting scientists might first reach other scientists, but over time (arrow) they will eventually attract members of the media, members of the public who are not scientists, and decision-makers (not necessarily in that order) as followers.

I’m glad to see this work but it’s use of language is not as precise in some places as it could be. They use the term ‘scientists’ throughout but their sample is made up of scientists identified as ecology and/or evolutionary biology (EEMB) researchers, as they briefly note in their Abstract and in the Methods section. With the constant use of the generic term, scientist, throughout most of the paper and taken in tandem with its use in the title, it’s easy to forget that this was a sample of a very specific population..

That the researchers’ sample of EEMB scientists is made up of those working at universities (academic scientists) is clear and it presents an interesting problem. How much does it matter that these are academic scientists? Both in regard to the research itself and with regard to perceptions about scientists. A sentence stating the question is beyond the scope of their research might have been a good idea.

Impressively, Darling and Côté have reached past the English language community to include other language groups, “We considered as many non-English Twitter profiles as possible by including common translations of languages we were familiar with (i.e., French and Spanish: biologista, professeur, profesora, etc.) in our search strings; …”

I cannot emphasize how rare it is to see this attempt to reach out beyond the English language community. Yes!

Getting back to my concern about language,  I would have used ‘suspect’ rather than ‘assume’ in this sentence from the paper’s Discussion, “We assume [emphasis mine] that the patterns we have uncovered for a sample of ecologists and evolutionary biologists in faculty positions can apply broadly across other academic disciplines.” I agree it’s quite likely but it’s an hypothesis/supposition and  needs to be tested. For example, will this hold true if you examine social scientists (such as economists, linguists, political scientists, psychologists, …) or physicists or mathematicians or …?

Is this evidence of unconscious bias regarding wheat the researchers term as ‘non-scientists’?  From the paper’s Discussion (Note: Links have been removed),

Of course, high numbers, diversity, and reach of followers offer no guarantee that messages will be read or understood. There is evidence that people selectively read what fits with their perception of the world (e.g., Sears and Freedman 1967; McPherson et al. 2001; Sunstein 2001; Himelboim et al. 2013). Thus, non-scientists [emphases mine] who follow scientists on Twitter might already be positively inclined to consume scientific information. If this is true, then one could argue that Twitter therefore remains an echo chamber, but it is a much larger one than the usual readership of scientific publications. Moreover, it is difficult to gauge the level of understanding of scientific tweets. The brevity and fragmented nature of science tweets can lead to shallow processing and comprehension of the message (Jiang et al. 2016). One metric of the influence of tweets is the extent to which they are shared (i.e., retweeted). Twitter users retweet posts when they find them interesting (hence the posts were at least read, if not understood) and when they deem the source credible (Metaxas et al. 2015). To our knowledge, there are no data on how often tweets by scientists are reposted by different types of followers. Such information would provide further evidence for an outreach function of Twitter in science communication.

Yes, it’s true that high numbers, etc. do not guarantee your messages will be read or understood and that people do selectively choose what fits their perception of the world. However, that applies equally to scientists and non-scientists despite what the authors appear to be claiming. Also, their use of the term non-scientist is not clear to me. Is this a synonym for ‘general public’ or is it being applied to anyone who may not have an educational background in science but is designated in another category such as policy makers, science communicators, etc. in the research paper?

In any event, ‘policy makers’ absorb a great deal of the researchers’ attention, from the paper’s Discussion (Note: Links have been removed),

Under most theories of change that describe how science ultimately affects evidence-based policies, decision-makers are a crucial group that should be engaged by scientists (Smith et al. 2013). Policy changes can be effected either through direct application of research to policy or, more often, via pressure from public awareness, which can drive or be driven by research (Baron 2010; Phillis et al. 2013). Either pathway requires active engagement by scientists with society (Lubchenco 2017). It is arguably easier than ever for scientists to have access to decision- and policy-makers, as officials at all levels of government are increasingly using social media to connect with the public (e.g., Grant et al. 2010; Kapp et al. 2015). However, we found that decision-makers accounted for only ∼0.3% (n = 191 out of 64 666) of the followers of academic scientists (see also Bombaci et al. 2016 in relation to the audiences of conference tweeting). Moreover, decision-makers begin to follow scientists in greater numbers only once the latter have reached a certain level of “popularity” (i.e., ∼2200 followers; Table 2). The general concern about whether scientific tweets are actually read by followers applies even more strongly to decision-makers, as they are known to use Twitter largely as a broadcasting tool rather than for dialogue (Grant et al. 2010). Thus, social media is not likely an effective replacement for more direct science-to-policy outreach that many scientists are now engaging in, such as testifying in front of special governmental committees, directly contacting decision-makers, etc. However, by actively engaging a large Twitter following of non-scientists, scientists increase the odds of being followed by a decision-maker who might see their messages, as well as the odds of being identified as a potential expert for further contributions.

It may due to the types of materials I tend to stumble across but science outreach has usually been presented as largely an educational effort with the long term goal of assuring the public will continue to support science funding. This passage in the research paper suggests more immediate political and career interests.

Should scientists be on Twitter?

This paper might discourage someone whose primary goal is to reach policy makers via this social media platform but the researchers seem to feel there is value in reaching out to a larger audience. While I’m not comfortable with how the researchers have generalized their results to the entire population of scientists, those results are intriguing..

This next bit features a scientist who as it turns out could be described as an EEMB (evolutionary biology and/or ecology) researcher.

How to tweet science

Stephen Heard wrote a July 31, 2018 posting on his Scientist Sees Squirrel blog about his Twitter feed,

At the 2018 conference of the Canadian Society for Ecology and Evolution, I was part of a lunchtime workshop, “The How and Why of Tweeting Science” – along with 5 friends.  Here I’ll share my slides and commentary.  I hope the other presenters will do the same, and I’ll link to them here as they become available.

 

I’ve been active on Twitter for about 4 years, but I’m very far from an expert, so my contribution to #CSEETweetShop was more to raise questions than to answer them.  What does it mean to “tweet to the science community”?  Here I’ll share some thoughts about Twitter audience, content, and voice.  These are, of course, my own (roughly formed) opinions, not some kind of wisdom on stone tablets, so take them with the requisite grain of salt!

Audience

 

Just as we do with blogging, we can draw a distinction between two audiences we might intend to reach via Twitter.  We might use Twitter for outreach, to talk to the general public – we could call this “science-communication tweeting”.  Or we could use Twitter for “inreach”, to talk to other scientists – which is what I’d call “science-community tweeting”.  But: for a couple of reasons, this distinction is not as clear as you might thing.  Or at least, your intent to reach one audience or the other may not match the outcome.

There are some data on the topic of scientists’ Twitter audiences.  The data in the slide above come from a recent paper by Isabelle Coté and Emily Darling.  They’re for a sample of 110 faculty members in ecology and evolution, for whom audiences are broken down by their relationship (if any) to science.  The key result: most ecology and evolution faculty on Twitter have audiences dominated by other scientists (light blue), with the general public (dark blue) a significant but more modest chunk. There’s variation, some of which may well relate to the tweeters’ intended audiences – but we can draw two fairly clear conclusions:

  • Nearly all of us tweet mostly to the science community; but
  • Almost none of us tweets only to the science community (or for that matter only to the general public).

The same paper analyzes follower composition as a function of audience size, and these data suggest that one’s audience is likely to change it builds.  Notice how the dark-blue “general public” line lags behind, then catches, the light-blue “other scientists” line*.  Earlier in your Twitter career, it’s likely that your audience will be even more strongly dominated by the science community – whether or not that’s what you intend.

In short: you probably can’t pick the audience you’re talking to; but you can pick the audience you’re talking for.  Given that, how might you use Twitter to talk for the science community?

I particularly like his constant questions about audience. He discusses other issues, such as content, but he always returns to the audience. Having worked in communication(s) and marketing, I have to applaud his focus on the audience. I can’t tell you how many times, we’d answer the question as to whom our audience was and we’d never revisit it. (mea culpa) Heard’s insistence on constantly checking in and questioning your assumptions is excellent.

Seeing  Coté’s and Darling’s paper cited in his presentation, gives some idea of how closely he follows the thinking about science outreach in his field.

Both Coté’s and Darling’s academic paper and Heard’s posting make for accessible reading while offering valuable information.

Health technology and the Canadian Broadcasting Corporation’s (CBC) two-tier health system ‘Viewpoint’

There’s a lot of talk and handwringing about Canada’s health care system, which ebbs and flows in almost predictable cycles. Jesse Hirsh in a May 16, 2017 ‘Viewpoints’ segment (an occasional series run as part the of the CBC’s [Canadian Broadcasting Corporation] flagship, daily news programme, The National) dared to reframe the discussion as one about technology and ‘those who get it’  [the technologically literate] and ‘those who don’t’,  a state Hirsh described as being illiterate as you can see and hear in the following video.

I don’t know about you but I’m getting tired of being called illiterate when I don’t know something. To be illiterate means you can’t read and write and as it turns out I do both of those things on a daily basis (sometimes even in two languages). Despite my efforts, I’m ignorant about any number of things and those numbers keep increasing day by day. BTW, Is there anyone who isn’t having trouble keeping up?

Moving on from my rhetorical question, Hirsh has a point about the tech divide and about the need for discussion. It’s a point that hadn’t occurred to me (although I think he’s taking it in the wrong direction). In fact, this business of a tech divide already exists if you consider that people who live in rural environments and need the latest lifesaving techniques or complex procedures or access to highly specialized experts have to travel to urban centres. I gather that Hirsh feels that this divide isn’t necessarily going to be an urban/rural split so much as an issue of how technically literate you and your doctor are.  That’s intriguing but then his argumentation gets muddled. Confusingly, he seems to be suggesting that the key to the split is your access (not your technical literacy) to artificial intelligence (AI) and algorithms (presumably he’s referring to big data and data analytics). I expect access will come down more to money than technological literacy.

For example, money is likely to be a key issue when you consider his big pitch is for access to IBM’s Watson computer. (My Feb. 28, 2011 posting titled: Engineering, entertainment, IBM’s Watson, and product placement focuses largely on Watson, its winning appearances on the US television game show, Jeopardy, and its subsequent adoption into the University of Maryland’s School of Medicine in a project to bring Watson into the examining room with patients.)

Hirsh’s choice of IBM’s Watson is particularly interesting for a number of reasons. (1) Presumably there are companies other than IBM in this sector. Why do they not rate a mention?  (2) Given the current situation with IBM and the Canadian federal government’s introduction of the Phoenix payroll system (a PeopleSoft product customized by IBM), which is  a failure of monumental proportions (a Feb. 23, 2017 article by David Reevely for the Ottawa Citizen and a May 25, 2017 article by Jordan Press for the National Post), there may be a little hesitation, if not downright resistance, to a large scale implementation of any IBM product or service, regardless of where the blame lies. (3) Hirsh notes on the home page for his eponymous website,

I’m presently spending time at the IBM Innovation Space in Toronto Canada, investigating the impact of artificial intelligence and cognitive computing on all sectors and industries.

Yes, it would seem he has some sort of relationship with IBM not referenced in his Viewpoints segment on The National. Also, his description of the relationship isn’t especially illuminating but perhaps it.s this? (from the IBM Innovation Space  – Toronto Incubator Application webpage),

Our incubator

The IBM Innovation Space is a Toronto-based incubator that provides startups with a collaborative space to innovate and disrupt the market. Our goal is to provide you with the tools needed to take your idea to the next level, introduce you to the right networks and help you acquire new clients. Our unique approach, specifically around client engagement, positions your company for optimal growth and revenue at an accelerated pace.

OUR SERVICES

IBM Bluemix
IBM Global Entrepreneur
Softlayer – an IBM Company
Watson

Startups partnered with the IBM Innovation Space can receive up to $120,000 in IBM credits at no charge for up to 12 months through the Global Entrepreneurship Program (GEP). These credits can be used in our products such our IBM Bluemix developer platform, Softlayer cloud services, and our world-renowned IBM Watson ‘cognitive thinking’ APIs. We provide you with enterprise grade technology to meet your clients’ needs, large or small.

Collaborative workspace in the heart of Downtown Toronto
Mentorship opportunities available with leading experts
Access to large clients to scale your startup quickly and effectively
Weekly programming ranging from guest speakers to collaborative activities
Help with funding and access to local VCs and investors​

Final comments

While I have some issues with Hirsh’s presentation, I agree that we should be discussing the issues around increased automation of our health care system. A friend of mine’s husband is a doctor and according to him those prescriptions and orders you get when leaving the hospital? They are not made up by a doctor so much as they are spit up by a computer based on the data that the doctors and nurses have supplied.

GIGO, bias, and de-skilling

Leaving aside the wonders that Hirsh describes, there’s an oldish saying in the computer business, garbage in/garbage out (gigo). At its simplest, who’s going to catch a mistake? (There are lots of mistakes made in hospitals and other health care settings.)

There are also issues around the quality of research. Are all the research papers included in the data used by the algorithms going to be considered equal? There’s more than one case where a piece of problematic research has been accepted uncritically, even if it get through peer review, and subsequently cited many times over. One of the ways to measure impact, i.e., importance, is to track the number of citations. There’s also the matter of where the research is published. A ‘high impact’ journal, such as Nature, Science, or Cell, automatically gives a piece of research a boost.

There are other kinds of bias as well. Increasingly, there’s discussion about algorithms being biased and about how machine learning (AI) can become biased. (See my May 24, 2017 posting: Machine learning programs learn bias, which highlights the issues and cites other FrogHeart posts on that and other related topics.)

These problems are to a large extent already present. Doctors have biases and research can be wrong and it can take a long time before there are corrections. However, the advent of an automated health diagnosis and treatment system is likely to exacerbate the problems. For example, if you don’t agree with your doctor’s diagnosis or treatment, you can search other opinions. What happens when your diagnosis and treatment have become data? Will the system give you another opinion? Who will you talk to? The doctor who got an answer from ‘Watson”? Is she or he going to debate Watson? Are you?

This leads to another issue and that’s automated systems getting more credit than they deserve. Futurists such as Hirsh tend to underestimate people and overestimate the positive impact that automation will have. A computer, data analystics, or an AI system are tools not gods. You’ll have as much luck petitioning one of those tools as you would Zeus.

The unasked question is how will your doctor or other health professional gain experience and skills if they never have to practice the basic, boring aspects of health care (asking questions for a history, reading medical journals to keep up with the research, etc.) and leave them to the computers? There had to be  a reason for calling it a medical ‘practice’.

There are definitely going to be advantages to these technological innovations but thoughtful adoption of these practices (pun intended) should be our goal.

Who owns your data?

Another issue which is increasingly making itself felt is ownership of data. Jacob Brogan has written a provocative May 23, 2017 piece for slate.com asking that question about the data Ancestry.com gathers for DNA testing (Note: Links have been removed),

AncestryDNA’s pitch to consumers is simple enough. For $99 (US), the company will analyze a sample of your saliva and then send back information about your “ethnic mix.” While that promise may be scientifically dubious, it’s a relatively clear-cut proposal. Some, however, worry that the service might raise significant privacy concerns.

After surveying AncestryDNA’s terms and conditions, consumer protection attorney Joel Winston found a few issues that troubled him. As he noted in a Medium post last week, the agreement asserts that it grants the company “a perpetual, royalty-free, world-wide, transferable license to use your DNA.” (The actual clause is considerably longer.) According to Winston, “With this single contractual provision, customers are granting Ancestry.com the broadest possible rights to own and exploit their genetic information.”

Winston also noted a handful of other issues that further complicate the question of ownership. Since we share much of our DNA with our relatives, he warned, “Even if you’ve never used Ancestry.com, but one of your genetic relatives has, the company may already own identifiable portions of your DNA.” [emphasis mine] Theoretically, that means information about your genetic makeup could make its way into the hands of insurers or other interested parties, whether or not you’ve sent the company your spit. (Maryam Zaringhalam explored some related risks in a recent Slate article.) Further, Winston notes that Ancestry’s customers waive their legal rights, meaning that they cannot sue the company if their information gets used against them in some way.

Over the weekend, Eric Heath, Ancestry’s chief privacy officer, responded to these concerns on the company’s own site. He claims that the transferable license is necessary for the company to provide its customers with the service that they’re paying for: “We need that license in order to move your data through our systems, render it around the globe, and to provide you with the results of our analysis work.” In other words, it allows them to send genetic samples to labs (Ancestry uses outside vendors), store the resulting data on servers, and furnish the company’s customers with the results of the study they’ve requested.

Speaking to me over the phone, Heath suggested that this license was akin to the ones that companies such as YouTube employ when users upload original content. It grants them the right to shift that data around and manipulate it in various ways, but isn’t an assertion of ownership. “We have committed to our users that their DNA data is theirs. They own their DNA,” he said.

I’m glad to see the company’s representatives are open to discussion and, later in the article, you’ll see there’ve already been some changes made. Still, there is no guarantee that the situation won’t again change, for ill this time.

What data do they have and what can they do with it?

It’s not everybody who thinks data collection and data analytics constitute problems. While some people might balk at the thought of their genetic data being traded around and possibly used against them, e.g., while hunting for a job, or turned into a source of revenue, there tends to be a more laissez-faire attitude to other types of data. Andrew MacLeod’s May 24, 2017 article for thetyee.ca highlights political implications and privacy issues (Note: Links have been removed),

After a small Victoria [British Columbia, Canada] company played an outsized role in the Brexit vote, government information and privacy watchdogs in British Columbia and Britain have been consulting each other about the use of social media to target voters based on their personal data.

The U.K.’s information commissioner, Elizabeth Denham [Note: Denham was formerly B.C.’s Office of the Information and Privacy Commissioner], announced last week [May 17, 2017] that she is launching an investigation into “the use of data analytics for political purposes.”

The investigation will look at whether political parties or advocacy groups are gathering personal information from Facebook and other social media and using it to target individuals with messages, Denham said.

B.C.’s Office of the Information and Privacy Commissioner confirmed it has been contacted by Denham.

Macleod’s March 6, 2017 article for thetyee.ca provides more details about the company’s role (note: Links have been removed),

The “tiny” and “secretive” British Columbia technology company [AggregateIQ; AIQ] that played a key role in the Brexit referendum was until recently listed as the Canadian office of a much larger firm that has 25 years of experience using behavioural research to shape public opinion around the world.

The larger firm, SCL Group, says it has worked to influence election outcomes in 19 countries. Its associated company in the U.S., Cambridge Analytica, has worked on a wide range of campaigns, including Donald Trump’s presidential bid.

In late February [2017], the Telegraph reported that campaign disclosures showed that Vote Leave campaigners had spent £3.5 million — about C$5.75 million [emphasis mine] — with a company called AggregateIQ, run by CEO Zack Massingham in downtown Victoria.

That was more than the Leave side paid any other company or individual during the campaign and about 40 per cent of its spending ahead of the June referendum that saw Britons narrowly vote to exit the European Union.

According to media reports, Aggregate develops advertising to be used on sites including Facebook, Twitter and YouTube, then targets messages to audiences who are likely to be receptive.

The Telegraph story described Victoria as “provincial” and “picturesque” and AggregateIQ as “secretive” and “low-profile.”

Canadian media also expressed surprise at AggregateIQ’s outsized role in the Brexit vote.

The Globe and Mail’s Paul Waldie wrote “It’s quite a coup for Mr. Massingham, who has only been involved in politics for six years and started AggregateIQ in 2013.”

Victoria Times Colonist columnist Jack Knox wrote “If you have never heard of AIQ, join the club.”

The Victoria company, however, appears to be connected to the much larger SCL Group, which describes itself on its website as “the global leader in data-driven communications.”

In the United States it works through related company Cambridge Analytica and has been involved in elections since 2012. Politico reported in 2015 that the firm was working on Ted Cruz’s presidential primary campaign.

And NBC and other media outlets reported that the Trump campaign paid Cambridge Analytica millions to crunch data on 230 million U.S. adults, using information from loyalty cards, club and gym memberships and charity donations [emphasis mine] to predict how an individual might vote and to shape targeted political messages.

That’s quite a chunk of change and I don’t believe that gym memberships, charity donations, etc. were the only sources of information (in the US, there’s voter registration, credit card information, and more) but the list did raise my eyebrows. It would seem we are under surveillance at all times, even in the gym.

In any event, I hope that Hirsh’s call for discussion is successful and that the discussion includes more critical thinking about the implications of Hirsh’s ‘Brave New World’.

#MuseumWeek starts March 23, 2015

For the second year in a row museums from all over the world are going to be meeting for a week long programme via Twitter hashtag #MuseumWeek. A March 20, 2015 news item on phys.org describes the event,

The Louvre, New York’s MoMA, the National Gallery of Australia, the Tokyo National Museum, Shakespeare’s Globe in Britain and more than 1,400 [1800 as of March 19, 2015 Pacific Timezone] other museums around the world are coming to Twitter next week.

From Monday, art institutions in 50 countries will be tweeting under the hashtag #MuseumWeek to publicise their collections and to highlight reactions, the US-based social network said in a statement.

French museum officials backed by Twitter and the French culture ministry are steering the week-long event, which seeks to engage with Twitter users worldwide.

A March 12, 2015 report on the press conference, ,which can be found on the #MuseumWeekwebsite noted this,

On Thursday 5th March 2015 at 5 pm, a press conference was held for the launch of the second #MuseumWeek in the Salon des Maréchaux at the Ministry of Culture and Communication. Present were the Minister, Fleur Pellerin, and Dick Costolo, CEO of Twitter.

Benjamin Benita (Universcience), Coordinator of #MuseumWeek 2015, presented its main concepts: the 7 days, 7 themes and 7 hashtags that can be found here. He also spoke about the event’s mode of governance: a steering committee made up of French museum professionals, accompanied by Mar Dixon and backed by Twitter and the French Ministry of Culture and Communication. He reminded all those present of this #MuseumWeek’s dual ambition: to roll out the operation all over the world and attract an even wider public. We are delighted to be able to announce that #MuseumWeek 2015 has already attracted 1,000 participating institutions in 44 countries! The list of participants can be found here.

Innovatory initiatives

Finally, the great innovation of this second #MuseumWeek, new initiatives were presented: a time capsule that will store all the tweets and be kept by the Cité des sciences et de l’industrie, along with a digital work created by the BRIGHT studio and artist Marcin Ignac, based on tweets sent during the operation.This work will be displayed at the Cité de l’architecture & du patrimoine.

There’s a pretty healthy list of Canadian museums and cultural institutions as a March 17, 2015 Global report notes,

Go ahead, tweet a selfie at your favourite museum. It will be encouraged during MuseumWeek, a Twitter event that runs from March 23 to 29.

Inaugurated last year in Europe, the celebration has gone global, with museums around the world planning to tweet about their treasures and inviting visitors to post their own pictures and thoughts on various themes.

More than 55 museums across Canada, large and small, say they will participate, ranging from Science World British Columbia and the Royal Alberta Museum to the Royal Ontario Museum, Montreal Museum of Fine Arts and the Army Museum in Halifax Citadel.

The National Gallery of Canada “jumped at the chance” to get involved, gallery director and CEO Marc Mayer said in a press release. He called the event an opportunity to engage with Canadians “in an authentic way that not only educates but celebrates art.”

All of the national Canadian science museums are represented.

You can keep up-to-date with the latest doings for #MuseumWeek here on this temporary Twitter account. If the temporary feed is anything to go by, this will be a multilingual experience.

Sifting through Twitter with your computer cluster of more than 600 nodes named Olympus—one of the Top 500 fastest supercomputers in the world.

Here are two (seemingly) contradictory pieces of information (1) the US Library of Congress takes over 24 hours to complete a single search of tweets archived from 2006 – 2010, according to my Jan. 16, 2013 posting, and (2) Court (Courtney) Corley, a data scientist at the US Dept. of Energy’s Pacific Northwest National Laboratory (PNNL), has a system (SALSA; SociAL Sensor Analytics) that analyzes billions of tweets in seconds. It’s a little hard to make sense out of these two very different perspectives on accessing data from tweets.

The news from Corley and the PNNL is more recent and, before I speculate further, here’s a bit more about Corley’s work, from the June 6, 2013 PNNL news release (also on EurekAlert)

If you think keeping up with what’s happening via Twitter, Facebook and other social media is like drinking from a fire hose, multiply that by 7 billion – and you’ll have a sense of what Court Corley wakes up to every morning.

Corley, a data scientist at the Department of Energy’s Pacific Northwest National Laboratory, has created a powerful digital system capable of analyzing billions of tweets and other social media messages in just seconds, in an effort to discover patterns and make sense of all the information. His social media analysis tool, dubbed “SALSA” (SociAL Sensor Analytics), combined with extensive know-how – and a fair degree of chutzpah – allows someone like Corley to try to grasp it all.

“The world is equipped with human sensors – more than 7 billion and counting. It’s by far the most extensive sensor network on the planet. What can we learn by paying attention?” Corley said.

Among the payoffs Corley envisions are emergency responders who receive crucial early information about natural disasters such as tornadoes; a tool that public health advocates can use to better protect people’s health; and information about social unrest that could help nations protect their citizens. But finding those jewels amidst the effluent of digital minutia is a challenge.

“The task we all face is separating out the trivia, the useless information we all are blasted with every day, from the really good stuff that helps us live better lives. There’s a lot of noise, but there’s some very valuable information too.”

I was getting a little worried when I saw the bit about separating useless information from the good stuff since that can be a very personal choice. Thankfully, this followed,

One person’s digital trash is another’s digital treasure. For example, people known in social media circles as “Beliebers,” named after entertainer Justin Bieber, covet inconsequential tidbits about Justin Bieber, while “non-Beliebers” send that data straight to the recycle bin.

The amount of data is mind-bending. In social media posted just in the single year ending Aug. 31, 2012, each hour on average witnessed:

  • 30 million comments
  • 25 million search queries
  • 98,000 new tweets
  • 3.8 million blog views
  • 4.5 million event invites
  • 7.1 million photos uploaded
  • 5.5 million status updates
  • The equivalent of 453 years of video watched

Several firms routinely sift posts on LinkedIn, Facebook, Twitter, YouTube and other social media, then analyze the data to see what’s trending. These efforts usually require a great deal of software and a lot of person-hours devoted specifically to using that application. It’s what Corley terms a manual approach.

Corley is out to change that, by creating a systematic, science-based, and automated approach for understanding patterns around events found in social media.

It’s not so simple as scanning tweets. Indeed, if Corley were to sit down and read each of the more than 20 billion entries in his data set from just a two-year period, it would take him more than 3,500 years if he spent just 5 seconds on each entry. If he hired 1 million helpers, it would take more than a day.

But it takes less than 10 seconds when he relies on PNNL’s Institutional Computing resource, drawing on a computer cluster with more than 600 nodes named Olympus, which is among the Top 500 fastest supercomputers in the world.

“We are using the institutional computing horsepower of PNNL to analyze one of the richest data sets ever available to researchers,” Corley said.

At the same time that his team is creating the computing resources to undertake the task, Corley is constructing a theory for how to analyze the data. He and his colleagues are determining baseline activity, culling the data to find routine patterns, and looking for patterns that indicate something out of the ordinary. Data might include how often a topic is the subject of social media, who is putting out the messages, and how often.

Corley notes additional challenges posed by social media. His programs analyze data in more than 60 languages, for instance. And social media users have developed a lexicon of their own and often don’t use traditional language. A post such as “aw my avalanna wristband @Avalanna @justinbieber rip angel pic.twitter.com/yldGVV7GHk” poses a challenge to people and computers alike.

Nevertheless, Corley’s program is accurate much more often than not, catching the spirit of a social media comment accurately more than three out of every four instances, and accurately detecting patterns in social media more than 90 percent of the time.

Corley’s educational background may explain the interest in emergency responders and health crises mentioned in the early part of the news release (from Corley’s PNNL webpage),

B.S. Computer Science from University of North Texas; M.S. Computer Science from University of North Texas; Ph.D. Computer Science and Engineering from University of North Texas; M.P.H (expected 2013) Public Health from University of Washington.

The reference to public health and emergency response is further developed, from the news release,

Much of the work so far has been around public health. According to media reports in China, the current H7N9 flu situation in China was highlighted on Sina Weibo, a China-based social media platform, weeks before it was recognized by government officials. And Corley’s work with the social media working group of the International Society for Disease Surveillance focuses on the use of social media for effective public health interventions.

In collaboration with the Infectious Disease Society of America and Immunizations 4 Public Health, he has focused on the early identification of emerging immunization safety concerns.

“If you want to understand the concerns of parents about vaccines, you’re never going to have the time to go out there and read hundreds of thousands, perhaps millions of tweets about those questions or concerns,” Corley said. “By creating a system that can capture trends in just a few minutes, and observe shifts in opinion minute to minute, you can stay in front of the issue, for instance, by letting physicians in certain areas know how to customize the educational materials they provide to parents of young children.”

Corley has looked closely at reaction to the vaccine that protects against HPV, which causes cervical cancer. The first vaccine was approved in 2006, when he was a graduate student, and his doctoral thesis focused on an analysis of social media messages connected to HPV. He found that creators of messages that named a specific drug company were less likely to be positive about the vaccine than others who did not mention any company by name.

Other potential applications include helping emergency responders react more efficiently to disasters like tornadoes, or identifying patterns that might indicate coming social unrest or even something as specific as a riot after a soccer game. More than a dozen college students or recent graduates are working with Corley to look at questions like these and others.

As to why the US Library of Congress requires 24 hours to search one term in their archived tweets and Corley and the PNNL require seconds to sift through two years of tweets, only two possibilities come to my mind. (1) Corley is doing a stripped down version of an archival search so his searches are not comparable to the Library of Congress searches or (2) Corley and the PNNL have far superior technology.

Tweet your nano

Researchers at the University of Wisconsin-Madison have published a study titled, “Tweeting nano: how public discourses about nanotechnology develop in social media environments,”  which analyses, for the first time, nanotechnology discourse on Twitter social media. From the Life Sciences Communication University of Wisconsin-Madison research webpage,

The study, “Tweeting nano: how public discourses about nanotechnology develop in social media environments,” mapped social media traffic about nanotechnology, finding that Twitter traffic expressing opinion about nanotechnology is more likely to originate from states with a federally-funded National Nanotechnology Initiative center or network than states without such centers.

Runge [Kristin K. Runge, doctoral student] and her co-authors used computational linguistic software to analyze a census of all English-language nanotechnology-related tweets expressing opinion posted on Twitter over one calendar year. In addition to mapping tweets by state, the team coded sentiment along two axes: certain vs. uncertain, and optimistic-neutral-pessimistic. They found 55% of nanotechnology-related opinions expressed certainty, 41% expressed pessimistic outlooks and 32% expressed neutral outlooks.

In addition to shedding light on how social media is used in communicating about an emerging technology, this study is believed to be the first published study to use a census of social media messages rather than a sample.

“We likely wouldn’t have captured these results if we had to rely on a sample rather than a complete census,” said Runge. “That would have been unfortunate, because the distinct geographic origins of the tweets and the tendency toward certainty in opinion expression will be useful in helping us understand how key online influencers are shaping the conversation around nanotechnology.”

It’s not obvious from this notice or the title of the study but it is stated clearly in the study that the focus is the world of US nano, not the English language world of nano. After reading the study (very quickly), I can say it’s interesting and, hopefully, will stimulate more work about public opinion that takes social media into account. (I’d love to know how they limited their study to US tweets only and how they determined the region that spawned the tweet. )

The one thing which puzzles me is they don’t mention retweets (RTs) specifically. Did they consider only original tweets? If not, did they take into account the possibility that someone might RT an item that does not reflect their own opinion? I occasionally RT something that doesn’t reflect my opinion when there isn’t sufficient space to include comment indicating otherwise because I want to promote discussion and that doesn’t necessarily take place on Twitter or in Twitter’s public space. This leads to another question, did the researchers include direct messages in their study? Unfortunately, there’s no mention in the two sections  (Discussion and Implications for future research) of the conclusion.

For those who would like to see the research for themselves (Note: The article is behind a paywall),

Tweeting nano: how public discourses about nanotechnology develop in social media environments by Kristin K. Runge, Sara K. Yeo, Michael Cacciatore, Dietram A. Scheufele, Dominique Brossard, Michael Xenos, Ashley Anderson, Doo-hun Choi, Jiyoun Kim, Nan Li, Xuan Liang, Maria Stubbings, and Leona Yi-Fan Su. Journal of Nanoparticle Research; An Interdisciplinary Forum for Nanoscale Science and Technology© Springer 10.1007/s11051-012-1381-8. Published online Jan. 4, 2013

It’s no surprise to see Dietram Scheufele and Dominique Brossard who are both located the University of Wisconsin-Madison and publish steadily on the topic of nanotechnology and public opinion listed as authors.

Researching tweets (the Twitter kind)

The US Library of Congress, in April 2010, made a deal with Twitter (microblogging service where people chat or tweet in 140 characters) to acquire all the tweets from 2006 to April 2010 and starting from April 2010, all subsequent tweets (the deal is mentioned in my April 15, 2010 posting [scroll down about 60% of the way]). Reading between the lines of the Library of Congress’ Jan. 2013 update/white paper, the job has proved even harder than they originally anticipated,

In April, 2010, the Library of Congress and Twitter signed an agreement providing the Library the public tweets from the company’s inception through the date of the agreement, an archive of tweets from 2006 through April, 2010. Additionally, the Library and Twitter agreed that Twitter would provide all public tweets on an ongoing basis under the same terms. The Library’s first objectives were to acquire and preserve the 2006-10 archive; to establish a secure, sustainable process for receiving and preserving a daily, ongoing stream of tweets through the present day; and to create a structure for organizing the entire archive by date. This month, all those objectives will be completed. To date, the Library has an archive of approximately 170 billion tweets.

The Library’s focus now is on confronting and working around the technology challenges to making the archive accessible to researchers and policymakers in a comprehensive, useful way. It is clear that technology to allow for scholarship access to large data sets is lagging behind technology for creating and distributing such data. Even the private sector has not yet implemented cost-effective commercial solutions because of the complexity and resource requirements of such a task. The Library is now pursuing partnerships with the private sector to allow some limited access capability in our reading rooms. These efforts are ongoing and a priority for the Library. (p. 1)

David Bruggeman in his Jan. 15, 2013 posting about this Library of Congress ‘Twitter project’ provides some mind-boggling numbers,

… That [170 billion tweets] represents the archive Twitter had at the time of the agreement (covering 2006-early 2010) and 150 billion tweets in the subsequent months (the Library receives over half a million new tweets each day, and that number continues to rise).  The archive includes the tweets and relevant metadata for a total of nearly 67 terabytes of data.

Gayle Osterberg, Director of Communications for the Library of Congress writes in a Jan. 4, 2013 posting on the Library of Congress blog about ‘tweet’ archive research issues,

Although the Library has been building and stabilizing the archive and has not yet offered researchers access, we have nevertheless received approximately 400 inquiries from researchers all over the world. Some broad topics of interest expressed by researchers run from patterns in the rise of citizen journalism and elected officials’ communications to tracking vaccination rates and predicting stock market activity.

The white paper/update offers a couple of specific examples of requests,

Some examples of the types of requests the Library has received indicate how researchers might use this archive to inform future scholarship:

* A master’s student is interested in understanding the role of citizens in disruptive events. The student is focusing on real-time micro-blogging of terrorist attacks. The questions focus on the timeliness and accuracy of tweets during specified events.

* A post-doctoral researcher is looking at the language used to spread information about charities’ activities and solicitations via social media during and immediately following natural disasters. The questions focus on audience targets and effectiveness. (p. 4)

At least one of the reasons  no one has received access to the tweets is that a single search of the archived (2006- 2010) tweets alone would take 24 hours,

The Library has assessed existing software and hardware solutions that divide and simultaneously search large data sets to reduce search time, so-called “distributed and parallel computing”. To achieve a significant reduction of search time, however, would require an extensive infrastructure of hundreds if not thousands of servers. This is cost prohibitive and impractical for a public institution.

Some private companies offer access to historic tweets but they are not the free, indexed and searchable access that would be of most value to legislative researchers and scholars.

It is clear that technology to allow for scholarship access to large data sets is not nearly as advanced as the technology for creating and distributing that data. Even the private sector has not yet implemented cost-effective commercial solutions because of the complexity and resource requirements of such a task. (p. 4)

David Bruggeman goes on to suggest that, in an attempt to make the tweets searchable and more easily accessible, all this information could end up behind a paywall (Note: A link has been removed),

I’m reminded of how Ancestry.com managed to get exclusive access to Census records.  While the Bureau benefitted from getting the records digitized, having this taxpayer-paid information controlled by a private company is problematic.

As a Canuck and someone who tweets (@frogheart), I’m not sure how I feel about having my tweets archived by the US Library of Congress in the first place, let alone the possibility I might have to pay for access to my old tweets.

FrogHeart’s 2012, a selective roundup of my international online colleagues, and other bits

This blog will be five years old in April 2013 and, sometime in January or February, the 2000th post will be published.

Statisticswise it’s been a tumultuous year for FrogHeart with ups and downs,  thankfully ending on an up note. According to my AW stats, I started with 54,920 visits in January (which was a bit of an increase over December 2011. The numbers rose right through to March 2012 when the blog registered 68,360 visits and then the numbers fell and continued to fall. At the low point, this blog registered 45, 972 visits in June 2012 and managed to rise and fall through to Oct. 2012 when the visits rose to 54,520 visits. November 2012 was better with 66,854 visits and in December 2012 the blog will have received over 75,000 visits. (ETA Ja.2.13: This blog registered 81,0036 in December 2012 and an annual total of 681,055 visits.) Since I have no idea why the numbers fell or why they rose again, I have absolutely no idea what 2013 will bring in terms of statistics (the webalizer numbers reflect similar trends).

Interestingly and for the first time since I’ve activated the AW statistics package in Feb. 2009, the US ceased to be the primary source for visitors. As of April 2012, the British surged ahead for several months until November 2012 when the US regained the top spot only to lose it to China in December 2012.

Favourite topics according to the top 10 key terms included: nanocrystalline cellulose for Jan. – Oct. 2012 when for the first time in almost three years the topic fell out of the top 10; Jackson Pollock and physics also popped up in the top 10 in various months throughout the year; Clipperton Island (a sci/art project) has made intermittent appearances; SPAUN (Semantic Pointer Arichitecture Unified Network; a project at the University of Waterloo) has made the top 10 in the two months since it was announced); weirdly, frogheart.ca has appeared in the top 10 these last few months; the Lycurgus Cup, nanosilver, and literary tattoos also made appearances in the top 10 in various months throughout the year, while the memristor and Québec nanotechnology made appearances in the fall.

Webalizer tells a similar but not identical story. The numbers started with 83, 133 visits in January 2012 rising to a dizzying height of 119, 217 in March.  These statistics fell too but July 2012 was another six figure month with 101,087 visits and then down again to five figures until Oct. 2012 with 108, 266 and 136,161 visits in November 2012. The December 2012 visits number appear to be dipping down slightly with 130,198 visits counted to 5:10 am PST, Dec. 31, 2012. (ETA Ja.2.13: In December 2012, 133,351 were tallied with an annual total of 1,660,771 visits.)

Thanks to my international colleagues who inspire and keep me apprised of the latest information on nanotechnology and other emerging technologies:

  • Pasco Phronesis, owned by David Bruggeman, focuses more on science policy and science communicati0n (via popular media) than on emerging technology per se but David provides excellent analysis and a keen eye for the international scene. He kindly dropped by frogheart.ca  some months ago to challenge my take on science and censorship in Canada and I have not finished my response. I’ve posted part 1 in the comments but have yet to get to part 2. His latest posting on Dec. 30, 2012 features this title, For Better Science And Technology Policing, Don’t Forget The Archiving.
  • Nanoclast is on the IEEE (Institute of Electrical and Electronics Engineers) website and features Dexter Johnson’s writing on nanotechnology government initiatives, technical breakthroughs, and, occasionally, important personalities within the field. I notice Dexter, who’s always thoughtful and thought-provoking, has cut back to a weekly posting. I encourage you to read his work as he fills in an important gap in a lot of nanotechnology reporting with his intimate understanding of the technology itself.  Dexter’s Dec. 20, 2012 posting (the latest) is titled, Nanoparticle Coated Lens Converts Light into Sound for Precise Non-invasive Surgery.
  • Insight (formerly TNTlog) is Tim Harper’s (CEO of Cientifica) blog features an international perspective (with a strong focus on the UK scene) on emerging technologies and the business of science. His writing style is quite lively (at times, trenchant) and it reflects his long experience with nanotechnology and other emerging technologies. I don’t know how he finds the time and here’s his latest, a Dec. 4, 2012 posting titled, Is Printable Graphene The Key To Widespread Applications?
  • 2020 Science is Dr. Andrew Maynard’s (director of University of Michigan’s Risk Science Center) more or less personal blog. An expert on nanotechnology (he was the Chief Science Adviser for the Project on Emerging Nanotechnologies, located in Washington, DC), Andrew writes extensively about risk, uncertainty, nanotechnology, and the joys of science. Over time his blog has evolved to include the occasional homemade but science-oriented video, courtesy of one of his children. I usually check Andrew’s blog when there’s a online nanotechnology kerfuffle as he usually has the inside scoop. His latest posting on Dec. 23, 2012 features this title, On the benefits of wearing a hat while dancing naked, and other insights into the science of risk.
  • Andrew also produces and manages the Mind the Science Gap blog, which is a project encouraging MA students in the University of Michigan’s Public Health Program to write. Andrew has posted a summary of the last semester’s triumphs titled, Looking back at another semester of Mind The Science Gap.
  • NanoWiki is, strictly speaking, not a blog but the authors provide the best compilation of stories on nanotechnology issues and controversies that I have found yet. Here’s how they describe their work, “NanoWiki tracks the evolution of paradigms and discoveries in nanoscience and nanotechnology field, annotates and disseminates them, giving an overall view and feeds the essential public debate on nanotechnology and its practical applications.” There are also Spanish, Catalan, and mobile versions of NanoWiki. Their latest posting, dated  Dec. 29, 2012, Nanotechnology shows we can innovate without economic growth, features some nanotechnology books.
  • In April 2012, I was contacted by Dorothée Browaeys about a French blog, Le Meilleur Des Nanomondes. Unfortunately, there doesn’t seem to have been much action there since Feb. 2010 but I’m delighted to hear from my European colleagues and hope to hear more from them.

Sadly, there was only one interview here this year but I think they call these things ‘a big get’ as the interview was with Vanessa Clive who manages the nanotechnology portfolio at Industry Canada. I did try to get an interview with Dr. Marie D’Iorio, the new Executive Director of Canada’s National Institute of Nanotechnology (NINT; BTW, the National Research Council has a brand new site consequently [since the NINT is a National Research Council agency, so does the NINT]), and experienced the same success I had with her predecessor, Dr. Nils Petersen.

I attended two conferences this year, S.NET (Society for the Study of Nanoscience and Emerging Technologies) 2012 meeting in Enschede, Holland where I presented on my work on memristors, artificial brains, and pop culture. The second conference I attended was in Calgary where I  moderated a panel I’d organized on the topic of Canada’s science culture and policy for the 2012 Canadian Science Policy Conference.

There are a few items of note which appeared on the Canadian science scene. ScienceOnlineVancouver emerged in April 2012. From the About page,

ScienceOnlineVancouver is a monthly discussion series exploring how online communication and social media impact current scientific research and how the general public learns about it. ScienceOnlineVancouver is an ongoing discussion about online science, including science communication and available research tools, not a lecture series where scientists talk about their work. Follow the conversation on Twitter at @ScioVan, hashtag is #SoVan.

The concept of these monthly meetings originated in New York with SoNYC @S_O_NYC, brought to life by Lou Woodley (@LouWoodley, Communities Specialist at Nature.com) and John Timmer (@j_timmer, Science Editor at Ars Technica). With the success of that discussion series, participation in Scio2012, and the 2012 annual meeting of the AAAS in Vancouver, Catherine Anderson, Sarah Chow, and Peter Newbury were inspired to bring it closer to home, leading to the beginning of ScienceOnlineVancouver.

ScienceOnlineVancouver is part of the ScienceOnlineNOW community that includes ScienceOnlineBayArea, @sciobayarea and ScienceOnlineSeattle, @scioSEA. Thanks to Brian Glanz of the Open Science Federation and SciFund Challenge and thanks to Science World for a great venue.

I have mentioned the arts/engineering festival coming up in Calgary, Beakerhead, a few times but haven’t had occasion to mention Science Rendezvous before. This festival started in Toronto in 2008 and became a national festival in 2012 (?). Their About page doesn’t describe the genesis of the ‘national’ aspect to this festival as clearly as I would like. They seem to be behind with their planning as there’s no mention of the 2013 festival,which should be coming up in May.

The twitter (@frogheart) feed continues to grow in both (followed and following) albeit slowly. I have to give special props to @carlacap, @cientifica, & @timharper for their mentions, retweets, and more.

As for 2013, there are likely to be some changes here; I haven’t yet decided what changes but I will keep you posted. Have a lovely new year and I wish you all the best in 2013.

Mice crash ScienceOnline Vancouver’s May 2012 event at Science World

The second ScienceOnline Vancouver event (a May 15, 2012 event mentioned in my May 14, 2012 posting, which has links to speakers’ blogs and also mentions a few still upcoming science events [May 22 and May 29, 2012]) with Eric Michael Johnson and Raul Pacheco-Vega discussing how to use social media effectively went well.

I can see the organizers refined their approach and the integration of technology (livestreaming, tweeting, etc.)  with a live event was smoother than the last one plus the transition from listening to the speakers to participating in discussion was smoother too.

Both Johnson and Pacheco-Vega highlighted how their use of social media has enhanced professional and personal connections and/or opened up new opportunities. For example, Johnson was asked to do a cover story for Times Higher Education (UK publication) that started with a tweet he wrote about bonobos (a primate found in the Congo only and his field of study for one of his degrees). After years of blogging, Johnson’s efforts were recognized in other ways as well,  his blog is now part of the Scientific American blogging network. Also present at the May event, but in the audience, was another local scientist and Scientific American blogger, Dr. Carin Bondar, who too has had opportunities open up as a consequence of social media. (BTW, she’s auditioning to be a TED speaker soon. I’m not sure which of the major TEDs but she has expressed her excitement about this on Twitter (#SoVan).

Pacheco-Vega focused more heavily on Twitter, Pinterest (consolidates your various social media efforts on a ‘bulletin or pin’ board), and timely.is (a software that allows you to schedule your tweets and allows you to analyze the best timing for releasing them during the day)  and offered tips and suggestions for other tools. (He maintains two identities online, a professional one and a personal one.) He also offered some insight into the nature of the doubts many scientists have about engaging in social media. Lack of time, why bother?, how does this help me professionally?, this is going to hurt me professionally, etc.

There were fewer people (about 1/2 the number they had at the April 2012 event) resulting in a crowd of about 30. Happily they had a liquor licence this time,  so libations were available.

As for the mice (or perhaps one very active mouse excited by the liquor licence), I had several sightings. Hopefully, Science World will have addressed the problem before the next ScienceOnline Vancouver event.

It’s going to be interesting to see how this evolves. To this point, I like the direction they’re taking.