Tag Archives: big data

Algorithms in decision-making: a government inquiry in the UK

Yesterday’s (Feb. 28, 2017) posting about the newly launched Cascadia Urban Analytics Cooperative grew too big to include interesting tidbits such as this one from Sense about Science, (from a Feb. 28, 2017 announcement received via email),

The House of Commons science and technology select committee announced
today that it will launch an inquiry into the use of algorithms in
decision-making […].

Our campaigns and policy officer Dr Stephanie Mathisen brought this
important and under-scrutinised issue to the committee as part of their
#MyScienceInquiry initiative; so fantastic news that they are taking up
the call.

A Feb. 28, 2017 UK House of Commons Science and Technology Select Committee press release gives more details about the inquiry,

The Science and Technology Committee is launching a new inquiry into the use of algorithms in public and business decision making.

In an increasingly digital world, algorithms are being used to make decisions in a growing range of contexts. From decisions about offering mortgages and credit cards to sifting job applications and sentencing criminals, the impact of algorithms is far reaching.

How an algorithm is formulated, its scope for error or correction, the impact it may have on an individual—and their ability to understand or challenge that decision—are increasingly relevant questions.

This topic was pitched to the Committee by Dr Stephanie Mathisen (Sense about Science) through the Committee’s ‘My Science Inquiry’ open call for inquiry suggestions, and has been chosen as the first subject for the Committee’s attention following that process. It follows the Committee’s recent work on Robotics and AI, and its call for a standing Commission on Artificial Intelligence.

Submit written evidence

The Committee would welcome written submissions by Friday 21 April 2017 on the following points:

  • The extent of current and future use of algorithms in decision-making in Government and public bodies, businesses and others, and the corresponding risks and opportunities;
  • Whether ‘good practice’ in algorithmic decision-making can be identified and spread, including in terms of:
    —  The scope for algorithmic decision-making to eliminate, introduce or amplify biases or discrimination, and how any such bias can be detected and overcome;
    — Whether and how algorithmic decision-making can be conducted in a ‘transparent’ or ‘accountable’ way, and the scope for decisions made by an algorithm to be fully understood and challenged;
    — DThe implications of increased transparency in terms of copyright and commercial sensitivity, and protection of an individual’s data;
  • Methods for providing regulatory oversight of algorithmic decision-making, such as the rights described in the EU General Data Protection Regulation 2016.

The Committee would welcome views on the issues above, and submissions that illustrate how the issues vary by context through case studies of the use of algorithmic decision-making.

You can submit written evidence through the algorithms in decision-making inquiry page.

I looked at the submission form and while it assumes the submitter is from the UK, there doesn’t seem to be any impediment to citizens of other countries from making a submission. Since there is some personal information included as part of the submission, there is a note about data protection on the Guidance on giving evidence to a Select Committee of the House of Commons webpage.

Big data in the Cascadia region: a University of British Columbia (Canada) and University of Washington (US state) collaboration

Before moving onto the news and for anyone unfamiliar with the concept of the Cascadia region, it is an informally proposed political region or a bioregion, depending on your perspective. Adding to the lack of clarity, the region generally includes the province of British Columbia in Canada and the two US states, Washington and Oregon but Alaska (another US state) and the Yukon (a Canadian territory) may also be included, as well as, parts of California, Wyoming, Idaho, and Montana. (You can read more about the Cascadia bioregion here and the proposed political region here.)  While it sounds as if more of the US is part of the ‘Cascadia region’, British Columbia and the Yukon cover considerably more territory than all of the mentioned states combined, if you’re taking a landmass perspective.

Cascadia Urban Analytics Cooperative

There was some big news about the smallest version of the Cascadia region on Thursday, Feb. 23, 2017 when the University of British Columbia (UBC) , the University of Washington (state; UW), and Microsoft announced the launch of the Cascadia Urban Analytics Cooperative. From the joint Feb. 23, 2017 news release (read on the UBC website or read on the UW website),

In an expansion of regional cooperation, the University of British Columbia and the University of Washington today announced the establishment of the Cascadia Urban Analytics Cooperative to use data to help cities and communities address challenges from traffic to homelessness. The largest industry-funded research partnership between UBC and the UW, the collaborative will bring faculty, students and community stakeholders together to solve problems, and is made possible thanks to a $1-million gift from Microsoft.

“Thanks to this generous gift from Microsoft, our two universities are poised to help transform the Cascadia region into a technological hub comparable to Silicon Valley and Boston,” said Professor Santa J. Ono, President of the University of British Columbia. “This new partnership transcends borders and strives to unleash our collective brain power, to bring about economic growth that enriches the lives of Canadians and Americans as well as urban communities throughout the world.”

“We have an unprecedented opportunity to use data to help our communities make decisions, and as a result improve people’s lives and well-being. That commitment to the public good is at the core of the mission of our two universities, and we’re grateful to Microsoft for making a community-minded contribution that will spark a range of collaborations,” said UW President Ana Mari Cauce.

Today’s announcement follows last September’s [2016] Emerging Cascadia Innovation Corridor Conference in Vancouver, B.C. The forum brought together regional leaders for the first time to identify concrete opportunities for partnerships in education, transportation, university research, human capital and other areas.

A Boston Consulting Group study unveiled at the conference showed the region between Seattle and Vancouver has “high potential to cultivate an innovation corridor” that competes on an international scale, but only if regional leaders work together. The study says that could be possible through sustained collaboration aided by an educated and skilled workforce, a vibrant network of research universities and a dynamic policy environment.

Microsoft President Brad Smith, who helped convene the conference, said, “We believe that joint research based on data science can help unlock new solutions for some of the most pressing issues in both Vancouver and Seattle. But our goal is bigger than this one-time gift. We hope this investment will serve as a catalyst for broader and more sustainable efforts between these two institutions.”

As part of the Emerging Cascadia conference, British Columbia Premier Christy Clark and Washington Governor Jay Inslee signed a formal agreement that committed the two governments to work closely together to “enhance meaningful and results-driven innovation and collaboration.”  The agreement outlined steps the two governments will take to collaborate in several key areas including research and education.

“Increasingly, tech is not just another standalone sector of the economy, but fully integrated into everything from transportation to social work,” said Premier Clark. “That’s why we’ve invested in B.C.’s thriving tech sector, but committed to working with our neighbours in Washington – and we’re already seeing the results.”

“This data-driven collaboration among some of our smartest and most creative thought-leaders will help us tackle a host of urgent issues,” Gov. Inslee said. “I’m encouraged to see our partnership with British Columbia spurring such interesting cross-border dialogue and excited to see what our students and researchers come up with.”

The Cascadia Urban Analytics Cooperative will revolve around four main programs:

  • The Cascadia Data Science for Social Good (DSSG) Summer Program, which builds on the success of the DSSG program at the UW eScience Institute. The cooperative will coordinate a joint summer program for students across UW and UBC campuses where they work with faculty to create and incubate data-intensive research projects that have concrete benefits for urban communities. One past DSSG project analyzed data from Seattle’s regional transportation system – ORCA – to improve its effectiveness, particularly for low-income transit riders. Another project sought to improve food safety by text mining product reviews to identify unsafe products.
  • Cascadia Data Science for Social Good Scholar Symposium, which will foster innovation and collaboration by bringing together scholars from UBC and the UW involved in projects utilizing technology to advance the social good. The first symposium will be hosted at UW in 2017.
  • Sustained Research Partnerships designed to establish the Pacific Northwest as a center of expertise and activity in urban analytics. The cooperative will support sustained research partnerships between UW and UBC researchers, providing technical expertise, stakeholder engagement and seed funding.
  • Responsible Data Management Systems and Services to ensure data integrity, security and usability. The cooperative will develop new software, systems and services to facilitate data management and analysis, as well as ensure projects adhere to best practices in fairness, accountability and transparency.

At UW, the Cascadia Urban Analytics Collaborative will be overseen by Urbanalytics (urbanalytics.uw.edu), a new research unit in the Information School focused on responsible urban data science. The Collaborative builds on previous investments in data-intensive science through the UW eScience Institute (escience.washington.edu) and investments in urban scholarship through Urban@UW (urban.uw.edu), and also aligns with the UW’s Population Health Initiative (uw.edu/populationhealth) that is addressing the most persistent and emerging challenges in human health, environmental resiliency and social and economic equity. The gift counts toward the UW’s Be Boundless – For Washington, For the World campaign (uw.edu/boundless).

The Collaborative also aligns with the UBC Sustainability Initiative (sustain.ubc.ca) that fosters partnerships beyond traditional boundaries of disciplines, sectors and geographies to address critical issues of our time, as well as the UBC Data Science Institute (dsi.ubc.ca), which aims to advance data science research to address complex problems across domains, including health, science and arts.

Brad Smith, President and Chief Legal Officer of Microsoft, wrote about the joint centre in a Feb. 23, 2017 posting on the Microsoft on the Issues blog (Note:,

The cities of Vancouver and Seattle share many strengths: a long history of innovation, world-class universities and a region rich in cultural and ethnic diversity. While both cities have achieved great success on their own, leaders from both sides of the border realize that tighter partnership and collaboration, through the creation of a Cascadia Innovation Corridor, will expand economic opportunity and prosperity well beyond what each community can achieve separately.

Microsoft supports this vision and today is making a $1 million investment in the Cascadia Urban Analytics Cooperative (CUAC), which is a new joint effort by the University of British Columbia (UBC) and the University of Washington (UW).  It will use data to help local cities and communities address challenges from traffic to homelessness and will be the region’s single largest university-based, industry-funded joint research project. While we recognize the crucial role that universities play in building great companies in the Pacific Northwest, whether it be in computing, life sciences, aerospace or interactive entertainment, we also know research, particularly data science, holds the key to solving some of Vancouver and Seattle’s most pressing issues. This grant will advance this work.

An Oct. 21, 2016 article by Hana Golightly for the Ubyssey newspaper provides a little more detail about the province/state agreement mentioned in the joint UBC/UW news release,

An agreement between BC Premier Christy Clark and Washington Governor Jay Inslee means UBC will be collaborating with the University of Washington (UW) more in the future.

At last month’s [Sept. 2016] Cascadia Conference, Clark and Inslee signed a Memorandum of Understanding with the goal of fostering the growth of the technology sector in both regions. Officially referred to as the Cascadia Innovation Corridor, this partnership aims to reduce boundaries across the region — economic and otherwise.

While the memorandum provides broad goals and is not legally binding, it sets a precedent of collaboration between businesses, governments and universities, encouraging projects that span both jurisdictions. Aiming to capitalize on the cultural commonalities of regional centres Seattle and Vancouver, the agreement prioritizes development in life sciences, clean technology, data analytics and high tech.

Metropolitan centres like Seattle and Vancouver have experienced a surge in growth that sees planners envisioning them as the next Silicon Valleys. Premier Clark and Governor Inslee want to strengthen the ability of their jurisdictions to compete in innovation on a global scale. Accordingly, the memorandum encourages the exploration of “opportunities to advance research programs in key areas of innovation and future technologies among the region’s major universities and institutes.”

A few more questions about the Cooperative

I had a few more questions about the Feb. 23, 2017 announcement, for which (from UBC) Gail C. Murphy, PhD, FRSC, Associate Vice President Research pro tem, Professor, Computer Science of UBC and (from UW) Bill Howe, Associate Professor, Information School, Adjunct Associate Professor, Computer Science & Engineering, Associate Director and Senior Data Science Fellow,, UW eScience Institute Program Director and Faculty Chair, UW Data Science Masters Degree have kindly provided answers (Gail Murphy’s replies are prefaced with [GM] and one indent and Bill Howe’s replies are prefaced with [BH] and two indents),

  • Do you have any projects currently underway? e.g. I see a summer programme is planned. Will there be one in summer 2017? What focus will it have?

[GM] UW and UBC will each be running the Data Science for Social Good program in the summer of 2017. UBC’s announcement of the program is available at: http://dsi.ubc.ca/data-science-social-good-dssg-fellowships

  • Is the $1M from Microsoft going to be given in cash or as ‘in kind goods’ or some combination?

[GM] The $1-million donation is in cash. Microsoft organized the Emerging Cascadia Innovation Corridor Conference in September 2017. It was at the conference that the idea for the partnership was hatched. Through this initiative, UBC and UW will continue to engage with Microsoft to further shared goals in promoting evidence-based innovation to improve life for people in the Cascadia region and beyond.

  • How will the money or goods be disbursed? e.g. Will each institution get 1/2 or is there some sort of joint account?

[GM] The institutions are sharing the funds but will be separately administering the funds they receive.

  • Is data going to be crossing borders? e.g. You mentioned some health care projects. In that case, will data from BC residents be accessed and subject to US rules and regulations? Will BC residents know that there data is being accessed by a 3rd party? What level of consent is required?

[GM] As you point out, there are many issues involved with transferring data across the border. Any projects involving private data will adhere to local laws and ethical frameworks set out by the institutions.

  • Privacy rules vary greatly between the US and Canada. How is that being addressed in this proposed new research?

[No Reply]

  • Will new software and other products be created and who will own them?

[GM] It is too soon for us to comment on whether new software or other products will be created. Any creation of software or other products within the institutions will be governed by institutional policy.

  • Will the research be made freely available?

[GM] UBC researchers must be able to publish the results of research as set out by UBC policy.

[BH] Research output at UW will be made available according to UW policy, but I’ll point out that Microsoft has long been a fantastic partner in advancing our efforts in open and reproducible science, open source software, and open access publishing. 

 UW’s discussion on open access policies is available online.

 

  • What percentage of public funds will be used to enable this project? Will the province of BC and the state of Washington be splitting the costs evenly?

[GM] It is too soon for us to report on specific percentages. At UBC, we will be looking to partner with appropriate funding agencies to support more research with this donation. Applications to funding agencies will involve review of any proposals as per the rules of the funding agency.

  • Will there be any social science and/or ethics component to this collaboration? The press conference referenced data science only.

[GM] We expect, but cannot yet confirm, that some of the projects will involve collaborations with faculty from a broad range of research areas at UBC.

[BH] We are indeed placing a strong emphasis on the intersection between data science, the social sciences, and data ethics.  As examples of activities in this space around UW:

* The Information School at UW (my home school) is actively recruiting a new faculty candidate in data ethics this year

* The Education Working Group at the eScience Institute has created a new campus-wide Data & Society seminar course.

* The Center for Statistics in the Social Sciences (CSSS), which represents the marriage of data science and the social sciences, has been a long-term partner in our activities.

More specifically for this collaboration, we are collecting requirements for new software that emphasizes responsible data science: properly managing sensitive data, combating algorithmic bias, protecting privacy, and more.

Microsoft has been a key partner in this work through their Civic Technology group, for which the Seattle arm is led by Graham Thompson.

  • What impact do you see the new US federal government’s current concerns over borders and immigrants hav[ing] on this project? e.g. Are people whose origins are in Iran, Syria, Yemen, etc. and who are residents of Canada going to be able to participate?

[GM] Students and others eligible to participate in research projects in Canada will be welcomed into the UBC projects. Our hope is that faculty and students working on the Cascadia Urban Analytics Cooperative will be able to exchange ideas freely and move freely back and forth across the border.

  • How will seed funding for Sustained Research Partnerships’ be disbursed? Will there be a joint committee making these decisions?

[GM] We are in the process of elaborating this part of the program. At UBC, we are already experiencing, enjoying and benefitting from increased interaction with the University of Washington and look forward to elaborating more aspects of the program together as the year unfolds.

I had to make a few formatting changes when transferring the answers from emails to this posting: my numbered questions (1-11) became bulleted points and ‘have’ in what was question 10 was changed to ‘having’. The content for the answers has been untouched.

I’m surprised no one answered the privacy question but perhaps they thought the other answers sufficed. Despite an answer to my question *about the disbursement of funds*, I don’t understand how the universities are sharing the funds but that may just mean I’m having a bad day. (Or perhaps the folks at UBC are being overly careful after the scandals rocking the Vancouver campus over the last 18 months to two years (see Sophie Sutcliffe’s Dec. 3, 2015 opinion piece for the Ubyssey for details about the scandals).

Bill Howe’s response about open access (where you can read the journal articles for free) and open source (where you have free access to the software code) was interesting to me as I once worked for a company where the developers complained loud and long about Microsoft’s failure to embrace open source code. Howe’s response is particularly interesting given that Microsoft’s president is also the Chief Legal Officer whose portfolio of responsibilities (I imagine) includes patents.

Matt Day in a Feb. 23, 2017 article for the The Seattle Times provides additional perspective (Note: Links have been removed),

Microsoft’s effort to nudge Seattle and Vancouver, B.C., a bit closer together got an endorsement Thursday [Feb. 23, 2017] from the leading university in each city.

The University of Washington and the University of British Columbia announced the establishment of a joint data-science research unit, called the Cascadia Urban Analytics Cooperative, funded by a $1 million grant from Microsoft.

The collaboration will support study of shared urban issues, from health to transit to homelessness, drawing on faculty and student input from both universities.

The partnership has its roots in a September [2016] conference in Vancouver organized by Microsoft’s public affairs and lobbying unit [emphasis mine.] That gathering was aimed at tying business, government and educational institutions in Microsoft’s home region in the Seattle area closer to its Canadian neighbor.

Microsoft last year [2016]* opened an expanded office in downtown Vancouver with space for 750 employees, an outpost partly designed to draw to the Northwest more engineers than the company can get through the U.S. guest worker system [emphasis mine].

There’s nothing wrong with a business offering to contribute to the social good but it does well to remember that a business’s primary agenda is not the social good.  So in this case, it seems that public affairs and lobbying is really governmental affairs and that Microsoft has anticipated, for some time, greater difficulties with getting workers from all sorts of countries across the US border to work in Washington state making an outpost in British Columbia and closer relations between the constituencies quite advantageous. I wonder what else is on their agenda.

Getting back to UBC and UW, thank you to both Gail Murphy (in particular) and Bill Howe for taking the time to answer my questions. I very much appreciate it as answering 10 questions is a lot of work.

There were one area of interest (cities) that I did not broach with the either academic but will mention here.

Cities and their increasing political heft

Clearly Microsoft is focused on urban issues and that would seem to be the ‘flavour du jour’. There’s a May 31, 2016 piece on the TED website by Robert Muggah and Benjamin Fowler titled: ‘Why cities rule the world‘ (there are video talks embedded in the piece),

Cities are the the 21st century’s dominant form of civilization — and they’re where humanity’s struggle for survival will take place. Robert Muggah and Benjamin Barber spell out the possibilities.

Half the planet’s population lives in cities. They are the world’s engines, generating four-fifths of the global GDP. There are over 2,100 cities with populations of 250,000 people or more, including a growing number of mega-cities and sprawling, networked-city areas — conurbations, they’re called — with at least 10 million residents. As the economist Ed Glaeser puts it, “we are an urban species.”

But what makes cities so incredibly important is not just population or economics stats. Cities are humanity’s most realistic hope for future democracy to thrive, from the grassroots to the global. This makes them a stark contrast to so many of today’s nations, increasingly paralyzed by polarization, corruption and scandal.

In a less hyperbolic vein, Parag Khanna’s April 20,2016 piece for Quartz describes why he (and others) believe that megacities are where the future lies (Note: A link has been removed),

Cities are mankind’s most enduring and stable mode of social organization, outlasting all empires and nations over which they have presided. Today cities have become the world’s dominant demographic and economic clusters.

As the sociologist Christopher Chase-Dunn has pointed out, it is not population or territorial size that drives world-city status, but economic weight, proximity to zones of growth, political stability, and attractiveness for foreign capital. In other words, connectivity matters more than size. Cities thus deserve more nuanced treatment on our maps than simply as homogeneous black dots.

Within many emerging markets such as Brazil, Turkey, Russia, and Indonesia, the leading commercial hub or financial center accounts for at least one-third or more of national GDP. In the UK, London accounts for almost half Britain’s GDP. And in America, the Boston-New York-Washington corridor and greater Los Angeles together combine for about one-third of America’s GDP.

By 2025, there will be at least 40 such megacities. The population of the greater Mexico City region is larger than that of Australia, as is that of Chongqing, a collection of connected urban enclaves in China spanning an area the size of Austria. Cities that were once hundreds of kilometers apart have now effectively fused into massive urban archipelagos, the largest of which is Japan’s Taiheiyo Belt that encompasses two-thirds of Japan’s population in the Tokyo-Nagoya-Osaka megalopolis.

Great and connected cities, Saskia Sassen argues, belong as much to global networks as to the country of their political geography. Today the world’s top 20 richest cities have forged a super-circuit driven by capital, talent, and services: they are home to more than 75% of the largest companies, which in turn invest in expanding across those cities and adding more to expand the intercity network. Indeed, global cities have forged a league of their own, in many ways as denationalized as Formula One racing teams, drawing talent from around the world and amassing capital to spend on themselves while they compete on the same circuit.

The rise of emerging market megacities as magnets for regional wealth and talent has been the most significant contributor to shifting the world’s focal point of economic activity. McKinsey Global Institute research suggests that from now until 2025, one-third of world growth will come from the key Western capitals and emerging market megacities, one-third from the heavily populous middle-weight cities of emerging markets, and one-third from small cities and rural areas in developing countries.

Khanna’s megacities all exist within one country. If Vancouver and Seattle (and perhaps Portland?) were to become a become a megacity it would be one of the only or few to cross national borders.

Khanna has been mentioned here before in a Jan. 27, 2016 posting about cities and technology and a public engagement exercise with the National Research of Council of Canada (scroll down to the subsection titled: Cities rising in important as political entities).

Muggah/Fowler’s and Khanna’s 2016 pieces are well worth reading if you have the time.

For what it’s worth, I’m inclined to agree that cities will be and are increasing in political  importance along with this area of development:

Algorithms and big data

Concerns are being raised about how big data is being utilized so I was happy to see specific initiatives to address ethics issues in Howe’s response. For anyone not familiar with the concerns, here’s an excerpt from Cathy O’Neil’s Oct. 18, 2016 article for Wired magazine,

The age of Big Data has generated new tools and ideas on an enormous scale, with applications spreading from marketing to Wall Street, human resources, college admissions, and insurance. At the same time, Big Data has opened opportunities for a whole new class of professional gamers and manipulators, who take advantage of people using the power of statistics.

I should know. I was one of them.

Information is power, and in the age of corporate surveillance, profiles on every active American consumer means that the system is slanted in favor of those with the data. This data helps build tailor-made profiles that can be used for or against someone in a given situation. Insurance companies, which historically sold car insurance based on driving records, have more recently started using such data-driven profiling methods. A Florida insurance company has been found to charge people with low credit scores and good driving records more than people with high credit scores and a drunk driving conviction. It’s become standard practice for insurance companies to charge people not what they represent as a risk, but what they can get away with. The victims, of course, are those least likely to be able to afford the extra cost, but who need a car to get to work.

Big data profiling techniques are exploding in the world of politics. It’s estimated that over $1 billion will be spent on digital political ads in this election cycle, almost 50 times as much as was spent in 2008; this field is a growing part of the budget for presidential as well as down-ticket races. Political campaigns build scoring systems on potential voters—your likelihood of voting for a given party, your stance on a given issue, and the extent to which you are persuadable on that issue. It’s the ultimate example of asymmetric information, and the politicians can use what they know to manipulate your vote or your donation.

I highly recommend reading O’Neil’s article and, if you have the time, her book ‘Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy’.

Finally

I look forward to hearing more about the Cascadia Urban Analytics Cooperative and the Cascadia Innovation Corridor as they develop. This has the potential to be very exciting although I do have some concerns such as MIcrosoft and its agendas, both stated and unstated. After all, the Sept. 2016 meeting was convened by Microsoft and its public affairs/lobbying group and the topic was innovation, which is code for business and as hinted earlier, business is not synonymous with social good. Having said that I’m not about to demonize business either. I just think a healthy dose of skepticism is called for. Good things can happen but we need to ensure they do.

Thankfully, my concerns regarding algorithms and big data seem to be shared in some quarters, unfortunately none of these quarters appear to be located at the University of British Columbia. I hope that’s over caution with regard to communication rather than a failure to recognize any pitfalls.

ETA Mar. 1, 2017: Interestingly, the UK House of Commons Select Committee on Science and Technology announced an inquiry into the use of algorithms in public and business decision-making on Feb. 28, 2017. As this posting as much too big already, I’ve posted about the UK inquire separately in a Mar. 1, 2017 posting.

*’2016′ added for clarity on March 24, 2017.

*’disbursement of funds’ added for clarity on Sept. 21, 2017.

Handling massive digital datasets the quantum way

A Jan. 25, 2016 news item on phys.org describes a new approach to analyzing and managing huge datasets,

From gene mapping to space exploration, humanity continues to generate ever-larger sets of data—far more information than people can actually process, manage, or understand.

Machine learning systems can help researchers deal with this ever-growing flood of information. Some of the most powerful of these analytical tools are based on a strange branch of geometry called topology, which deals with properties that stay the same even when something is bent and stretched every which way.

Such topological systems are especially useful for analyzing the connections in complex networks, such as the internal wiring of the brain, the U.S. power grid, or the global interconnections of the Internet. But even with the most powerful modern supercomputers, such problems remain daunting and impractical to solve. Now, a new approach that would use quantum computers to streamline these problems has been developed by researchers at [Massachusetts Institute of Technology] MIT, the University of Waterloo, and the University of Southern California [USC}.

A Jan. 25, 2016 MIT news release (*also on EurekAlert*), which originated the news item, describes the theory in more detail,

… Seth Lloyd, the paper’s lead author and the Nam P. Suh Professor of Mechanical Engineering, explains that algebraic topology is key to the new method. This approach, he says, helps to reduce the impact of the inevitable distortions that arise every time someone collects data about the real world.

In a topological description, basic features of the data (How many holes does it have? How are the different parts connected?) are considered the same no matter how much they are stretched, compressed, or distorted. Lloyd [ explains that it is often these fundamental topological attributes “that are important in trying to reconstruct the underlying patterns in the real world that the data are supposed to represent.”

It doesn’t matter what kind of dataset is being analyzed, he says. The topological approach to looking for connections and holes “works whether it’s an actual physical hole, or the data represents a logical argument and there’s a hole in the argument. This will find both kinds of holes.”

Using conventional computers, that approach is too demanding for all but the simplest situations. Topological analysis “represents a crucial way of getting at the significant features of the data, but it’s computationally very expensive,” Lloyd says. “This is where quantum mechanics kicks in.” The new quantum-based approach, he says, could exponentially speed up such calculations.

Lloyd offers an example to illustrate that potential speedup: If you have a dataset with 300 points, a conventional approach to analyzing all the topological features in that system would require “a computer the size of the universe,” he says. That is, it would take 2300 (two to the 300th power) processing units — approximately the number of all the particles in the universe. In other words, the problem is simply not solvable in that way.

“That’s where our algorithm kicks in,” he says. Solving the same problem with the new system, using a quantum computer, would require just 300 quantum bits — and a device this size may be achieved in the next few years, according to Lloyd.

“Our algorithm shows that you don’t need a big quantum computer to kick some serious topological butt,” he says.

There are many important kinds of huge datasets where the quantum-topological approach could be useful, Lloyd says, for example understanding interconnections in the brain. “By applying topological analysis to datasets gleaned by electroencephalography or functional MRI, you can reveal the complex connectivity and topology of the sequences of firing neurons that underlie our thought processes,” he says.

The same approach could be used for analyzing many other kinds of information. “You could apply it to the world’s economy, or to social networks, or almost any system that involves long-range transport of goods or information,” says Lloyd, who holds a joint appointment as a professor of physics. But the limits of classical computation have prevented such approaches from being applied before.

While this work is theoretical, “experimentalists have already contacted us about trying prototypes,” he says. “You could find the topology of simple structures on a very simple quantum computer. People are trying proof-of-concept experiments.”

Ignacio Cirac, a professor at the Max Planck Institute of Quantum Optics in Munich, Germany, who was not involved in this research, calls it “a very original idea, and I think that it has a great potential.” He adds “I guess that it has to be further developed and adapted to particular problems. In any case, I think that this is top-quality research.”

Here’s a link to and a citation for the paper,

Quantum algorithms for topological and geometric analysis of data by Seth Lloyd, Silvano Garnerone, & Paolo Zanardi. Nature Communications 7, Article number: 10138 doi:10.1038/ncomms10138 Published 25 January 2016

This paper is open access.

ETA Jan. 25, 2016 1245 hours PST,

Shown here are the connections between different regions of the brain in a control subject (left) and a subject under the influence of the psychedelic compound psilocybin (right). This demonstrates a dramatic increase in connectivity, which explains some of the drug’s effects (such as “hearing” colors or “seeing” smells). Such an analysis, involving billions of brain cells, would be too complex for conventional techniques, but could be handled easily by the new quantum approach, the researchers say. Courtesy of the researchers

Shown here are the connections between different regions of the brain in a control subject (left) and a subject under the influence of the psychedelic compound psilocybin (right). This demonstrates a dramatic increase in connectivity, which explains some of the drug’s effects (such as “hearing” colors or “seeing” smells). Such an analysis, involving billions of brain cells, would be too complex for conventional techniques, but could be handled easily by the new quantum approach, the researchers say. Courtesy of the researchers

*’also on EurekAlert’ text and link added Jan. 26, 2016.

Simon Fraser University (Vancouver, Canada) and its president’s (Andrew Petter) dream colloquium: big data

They have a ‘big data’ start to 2016 planned for the President’s (Andrew Petter at Simon Fraser University [SFU] in Vancouver, Canada) Dream Colloquium according to a Jan. 5, 2016 news release,

Big data explained: SFU launches spring 2016 President’s Dream Colloquium

Speaker series tackles history, use and implications of collecting data

 

Canadians experience and interact with big data on a daily basis. Some interactions are as simple as buying coffee or as complex as filling out the Canadian government’s mandatory long-form census. But while big data may be one of the most important technological and social shifts in the past five years, many experts are still grappling with what to do with the massive amounts of information being gathered every day.

 

To help understand the implications of collecting, analyzing and using big data, Simon Fraser University is launching the President’s Dream Colloquium on Engaging Big Data on Tuesday, January 5.

 

“Big data affects all sectors of society from governments to businesses to institutions to everyday people,” says Peter Chow-White, SFU Associate Professor of Communication. “This colloquium brings together people from industry and scholars in computing and social sciences in a dialogue around one of the most important innovations of our time next to the Internet.”

 

This spring marks the first President’s Dream Colloquium where all faculty and guest lectures will be available to the public. The speaker series will give a historical overview of big data, specific case studies in how big data is used today and discuss what the implications are for this information’s usage in business, health and government in the future.

 

The series includes notable guest speakers such as managing director of Microsoft Research, Surajit Chaudhuri, and Tableau co-founder Pat Hanrahan.  

 

“Pat Hanrahan is a leader in a number of sectors and Tableau is a leader in accessing big data through visual analytics,” says Chow-White. “Rather than big data being available to only a small amount of professionals, Tableau makes it easier for everyday people to access and understand it in a visual way.”

 

The speaker series is free to attend with registration. Lectures will be webcast live and available on the President’s Dream Colloquium website.

 

FAST FACTS:

  • By 2020, over 1/3 of all data will live in or pass through the cloud.
  • Data production will be 44 times greater in 2020 than it was in 2009.
  • More than 70 percent of the digital universe is generated by individuals. But enterprises have responsibility for the storage, protection and management of 80 percent of that.

(Statistics provided by CSC)

 

WHO’S SPEAKING AT THE COLLOQUIUM:

 

The course features lectures from notable guest speakers including:

  • Sasha Issenberg, Author and Journalist
    Tuesday, January 12, 2016
  • Surajit ChaudhuriScientist and Managing Director of XCG (Microsoft Research)
    Tuesday, January 19, 2016
  • Pat Hanrahan, Professor at the Stanford Computer Graphics Laboratory, Cofounder and Chief Scientist of Tableau, Founding member of Pixar
    Wednesday, February 3, 2016
  • Sheelagh Carpendale, Professor of Computing Science University of Calgary, Canada Research Chair in Information Visualization
    Tuesday, February 23, 2016, 3:30pm
  • Colin HillCEO of GNS Healthcare
    Tuesday, March 8, 2016
  • Chad Skelton, Award-winning Data Journalist and Consultant
    Tuesday, March 22, 2016

Not to worry, even though the first talk with Sasha Issenberg and Mark Pickup (strangely, he’s [Pickup is an SFU professor of political science] not mentioned in the news release or on the event page) has taken place, a webcast is being posted to the event page here.

I watched the first event live (via a livestream webcast which I accessed by clicking on the link found on the Event’s Speaker’s page) and found it quite interesting although I’m not sure about asking Issenberg to speak extemporaneously. He rambled and offered more detail about things that don’t matter much to a Canadian audience. I couldn’t tell if part of the problem might lie with the fact that his ‘big data’ book (The Victory Lab: The Secret Science of Winning Campaigns) was published a while back and he’s since published one on medical tourism and is about to publish one on same sex marriages and the LGBTQ communities in the US. As someone else who moves from topic to topic, I know it’s an effort to ‘go back in time’ and to remember the details and to recapture the enthusiasm that made the piece interesting.  Also, he has yet to get the latest scoop on big data and politics in the US as embarking on the 2016 campaign trail won’t take place until sometime later in January.

So, thanks to Issenberg for managing to dredge up as much as he did. Happily, he did recognize that there are differences between Canada and the US and the type of election data that is gathered and other data that can accessed. He provided a capsule version of the data situation in the US where they can identify individuals and predict how they might vote, while Pickup focused on the Canadian scene. As one expects from Canadian political parties and Canadian agencies in general, no one really wants to share how much information they can actually access (yes, that’s true of the Liberals and the NDP [New Democrats] too). By contrast, political parties and strategists in the US quite openly shared information with Issenberg about where and how they get data.

Pickup made some interesting points about data and how more data does not lead to better predictions. There was one study done on psychologists which Pickup replicated with undergraduate political science students. The psychologists and the political science students in the two separate studies were given data and asked to predict behaviour. They were then given more data about the same individuals and asked again to predict behaviour. In all. there were four sessions where the subjects were given successively more data and asked to predict behaviour based on that data. You may have already guessed but prediction accuracy decreased each time more information was added. Conversely, the people making the predictions became more confident as their predictive accuracy declined. A little disconcerting, non?

Pickup made another point noting that it may be easier to use big data to predict voting behaviour in a two-party system such as they have in the US but a multi-party system such as we have in Canada offers more challenges.

So, it was a good beginning and I look forward to more in the coming weeks (President’s Dream Colloquium on Engaging Big Data). Remember if you can’t listen to the live session, just click through to the event’s speaker’s page where they have hopefully posted the webcast.

The next dream colloquium takes place Tuesday, Jan. 19, 2016,

Big Data since 1854

Dr. Surajit Chaudhuri, Scientist and Managing Director of XCG (Microsoft Research)
Standford University, PhD
Tuesday, January 19, 2016, 3:30–5 pm
IRMACS Theatre, ASB 10900, Burnaby campus [or by webcast[

Enjoy!

Nanotechnology takes the big data dive

Duke University’s (North Carolina, US) Center for Environmental Implications of Nano Technology (CEINT) is back in the news. An August 18, 2015 news item on Nanotechnology Now  highlights two new projects intended to launch the field of nanoinformatics,

In two new studies, researchers from across the country spearheaded by Duke University faculty have begun to design the framework on which to build the emerging field of nanoinformatics.

An August 18, 2015 Duke University news release on EurekAlert, which originated the news item, describes the notion of nanoinformatics and how Duke is playing a key role in establishing this field,

Nanoinformatics is, as the name implies, the combination of nanoscale research and informatics. It attempts to determine which information is relevant to the field and then develop effective ways to collect, validate, store, share, analyze, model and apply that information — with the ultimate goal of helping scientists gain new insights into human health, the environment and more.

In the first paper, published on August 10, 2015, in the Beilstein Journal of Nanotechnology, researchers begin the conversation of how to standardize the way nanotechnology data are curated.

Because the field is young and yet extremely diverse, data are collected and reported in different ways in different studies, making it difficult to compare apples to apples. Silver nanoparticles in a Florida swamp could behave entirely differently if studied in the Amazon River. And even if two studies are both looking at their effects in humans, slight variations like body temperature, blood pH levels or nanoparticles only a few nanometers larger can give different results. For future studies to combine multiple datasets to explore more complex questions, researchers must agree on what they need to know when curating nanomaterial data.

“We chose curation as the focus of this first paper because there are so many disparate efforts that are all over the road in terms of their missions, and the only thing they all have in common is that somehow they have to enter data into their resources,” said Christine Hendren, a research scientist at Duke and executive director of the Center for the Environmental Implications of NanoTechnology (CEINT). “So we chose that as the kernel of this effort to be as broad as possible in defining a baseline for the nanoinformatics community.”

The paper is the first in a series of six that will explore what people mean — their vocabulary, definitions, assumptions, research environments, etc. — when they talk about gathering data on nanomaterials in digital form. And to get everyone on the same page, the researchers are seeking input from all stakeholders, including those conducting basic research, studying environmental implications, harnessing nanomaterial properties for applications, developing products and writing government regulations.

The daunting task is being undertaken by the Nanomaterial Data Curation Initiative (NDCI), a project of the National Cancer Informatics Nanotechnology Working Group (NCIP NanoWG) lead by a diverse team of nanomaterial data stakeholders. If successful, not only will these disparate interests be able to combine their data, the project will highlight what data are missing and help drive the research priorities of the field.

In the second paper, published on July 16, 2015, in Science of The Total Environment, Hendren and her colleagues at CEINT propose a new, standardized way of studying the properties of nanomaterials.

“If we’re going to move the field forward, we have to be able to agree on what measurements are going to be useful, which systems they should be measured in and what data gets reported, so that we can make comparisons,” said Hendren.

The proposed strategy uses functional assays — relatively simple tests carried out in standardized, well-described environments — to measure nanomaterial behavior in actual systems.

For some time, the nanomaterial research community has been trying to use measured nanomaterial properties to predict outcomes. For example, what size and composition of a nanoparticle is most likely to cause cancer? The problem, argues Mark Wiesner, director of CEINT, is that this question is far too complex to answer.

“Environmental researchers use a parameter called biological oxygen demand to predict how much oxygen a body of water needs to support its ecosystem,” explains Wiesner. “What we’re basically trying to do with nanomaterials is the equivalent of trying to predict the oxygen level in a lake by taking an inventory of every living organism, mathematically map all of their living mechanisms and interactions, add up all of the oxygen each would take, and use that number as an estimate. But that’s obviously ridiculous and impossible. So instead, you take a jar of water, shake it up, see how much oxygen is taken and extrapolate that. Our functional assay paper is saying do that for nanomaterials.”

The paper makes suggestions as to what nanomaterials’ “jar of water” should be. It identifies what parameters should be noted when studying a specific environmental system, like digestive fluids or wastewater, so that they can be compared down the road.

It also suggests two meaningful processes for nanoparticles that should be measured by functional assays: attachment efficiency (does it stick to surfaces or not) and dissolution rate (does it release ions).

In describing how a nanoinformatics approach informs the implementation of a functional assay testing strategy, Hendren said “We’re trying to anticipate what we want to ask the data down the road. If we’re banking all of this comparable data while doing our near-term research projects, we should eventually be able to support more mechanistic investigations to make predictions about how untested nanomaterials will behave in a given scenario.”

Here are links to and citations for the papers,

The Nanomaterial Data Curation Initiative: A collaborative approach to assessing, evaluating, and advancing the state of the field by Christine Ogilvie Hendren, Christina M. Powers, Mark D. Hoover, and Stacey L. Harper.  Beilstein J. Nanotechnol. 2015, 6, 1752–1762. doi:10.3762/bjnano.6.179 Published 18 Aug 2015

A functional assay-based strategy for nanomaterial risk forecasting by Christine Ogilvie Hendren, Gregory V. Lowry, Jason M. Unrine, and Mark R. Wiesner. Science of The Total Environment Available online 16 July 2015 In Press, Corrected Proof  DOI: 10.1016/j.scitotenv.2015.06.100.

The first paper listed in open access while the second paper is behind a paywall.

I’m (mostly) giving the final comments to Dexter Johnson who in an August 20, 2015 posting on his Nanoclast blog (on the IEEE [Institute of Electrical and Electronics Engineers] website) had this to say (Note: Links have been removed),

It can take days for a supercomputer to unravel all the data contained in a single human genome. So it wasn’t long after mapping the first human genome that researchers coined the umbrella term “bioinformatics” in which a variety of methods and computer technologies are used for organizing and analyzing all that data.

Now teams of researchers led by scientists at Duke University believe that the field of nanotechnology has reached a critical mass of data and that a new field needs to be established, dubbed “nanoinformatics.

While being able to better organize and analyze data to study the impact of nanomaterials on the environment should benefit the field, what seems to remain a more pressing concern is having the tools for measuring nanomaterials outside of a vacuum and in water and air environments.”

I gather Christine Hendren has succeeded Mark Weisner as CEINT’s executive director.

Big data, data visualization, and spatial relationships with computers

I’m going to tie together today’s previous postings (Sporty data science Digitizing and visualizing the humanities, and Picture worth more than a thousand numbers? Yes and no with a future-oriented Feb. 2010 TED talk by John Underkoffler (embedded below). I have mentioned this talk previously in my June 14, 2012 posting titled, Interacting with stories and/or with data. From his TED speaker’s webpage,

Remember the data interface from Minority Report? Well, it’s real, John Underkoffler invented it — as a point-and-touch interface called g-speak — and it’s about to change the way we interact with data.

When Tom Cruise put on his data glove and started whooshing through video clips of future crimes, how many of us felt the stirrings of geek lust? This iconic scene in Minority Report marked a change in popular thinking about interfaces — showing how sexy it could be to use natural gestures, without keyboard, mouse or command line.

John Underkoffler led the team that came up with this interface, called the g-speak Spatial Operating Environment. His company, Oblong Industries, was founded to move g-speak into the real world. Oblong is building apps for aerospace, bioinformatics, video editing and more. But the big vision is ubiquity: g-speak on every laptop, every desktop, every microwave oven, TV, dashboard. “It has to be like this,” he says. “We all of us every day feel that. We build starting there. We want to change it all.”

Before founding Oblong, Underkoffler spent 15 years at MIT’s Media Laboratory, working in holography, animation and visualization techniques, and building the I/O Bulb and Luminous Room Systems.

He’s talking about human-computer interfaces but I found the part where he manipulates massive amounts of data (from approx. 8 mins. – 9.5 mins.) particularly instructive. This video is longer (approx. 15.5 mins. as opposed to 5 mins. or less) than the videos I usually embed.

I think the real game changer for science  (how it’s conducted, how it’s taught, and how it’s communicated) and other disciplines is data visualization.

ETA Aug. 3, 2012 1:20 pm PDT: For those who might want to see this video in its ‘native’ habitat, go here http://www.ted.com/talks/john_underkoffler_drive_3d_data_with_a_gesture.html.