Tag Archives: Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy

Machine learning programs learn bias

The notion of bias in artificial intelligence (AI)/algorithms/robots is gaining prominence (links to other posts featuring algorithms and bias are at the end of this post). The latest research concerns machine learning where an artificial intelligence system trains itself with ordinary human language from the internet. From an April 13, 2017 American Association for the Advancement of Science (AAAS) news release on EurekAlert,

As artificial intelligence systems “learn” language from existing texts, they exhibit the same biases that humans do, a new study reveals. The results not only provide a tool for studying prejudicial attitudes and behavior in humans, but also emphasize how language is intimately intertwined with historical biases and cultural stereotypes. A common way to measure biases in humans is the Implicit Association Test (IAT), where subjects are asked to pair two concepts they find similar, in contrast to two concepts they find different; their response times can vary greatly, indicating how well they associated one word with another (for example, people are more likely to associate “flowers” with “pleasant,” and “insects” with “unpleasant”). Here, Aylin Caliskan and colleagues developed a similar way to measure biases in AI systems that acquire language from human texts; rather than measuring lag time, however, they used the statistical number of associations between words, analyzing roughly 2.2 million words in total. Their results demonstrate that AI systems retain biases seen in humans. For example, studies of human behavior show that the exact same resume is 50% more likely to result in an opportunity for an interview if the candidate’s name is European American rather than African-American. Indeed, the AI system was more likely to associate European American names with “pleasant” stimuli (e.g. “gift,” or “happy”). In terms of gender, the AI system also reflected human biases, where female words (e.g., “woman” and “girl”) were more associated than male words with the arts, compared to mathematics. In a related Perspective, Anthony G. Greenwald discusses these findings and how they could be used to further analyze biases in the real world.

There are more details about the research in this April 13, 2017 Princeton University news release on EurekAlert (also on ScienceDaily),

In debates over the future of artificial intelligence, many experts think of the new systems as coldly logical and objectively rational. But in a new study, researchers have demonstrated how machines can be reflections of us, their creators, in potentially problematic ways. Common machine learning programs, when trained with ordinary human language available online, can acquire cultural biases embedded in the patterns of wording, the researchers found. These biases range from the morally neutral, like a preference for flowers over insects, to the objectionable views of race and gender.

Identifying and addressing possible bias in machine learning will be critically important as we increasingly turn to computers for processing the natural language humans use to communicate, for instance in doing online text searches, image categorization and automated translations.

“Questions about fairness and bias in machine learning are tremendously important for our society,” said researcher Arvind Narayanan, an assistant professor of computer science and an affiliated faculty member at the Center for Information Technology Policy (CITP) at Princeton University, as well as an affiliate scholar at Stanford Law School’s Center for Internet and Society. “We have a situation where these artificial intelligence systems may be perpetuating historical patterns of bias that we might find socially unacceptable and which we might be trying to move away from.”

The paper, “Semantics derived automatically from language corpora contain human-like biases,” published April 14  [2017] in Science. Its lead author is Aylin Caliskan, a postdoctoral research associate and a CITP fellow at Princeton; Joanna Bryson, a reader at University of Bath, and CITP affiliate, is a coauthor.

As a touchstone for documented human biases, the study turned to the Implicit Association Test, used in numerous social psychology studies since its development at the University of Washington in the late 1990s. The test measures response times (in milliseconds) by human subjects asked to pair word concepts displayed on a computer screen. Response times are far shorter, the Implicit Association Test has repeatedly shown, when subjects are asked to pair two concepts they find similar, versus two concepts they find dissimilar.

Take flower types, like “rose” and “daisy,” and insects like “ant” and “moth.” These words can be paired with pleasant concepts, like “caress” and “love,” or unpleasant notions, like “filth” and “ugly.” People more quickly associate the flower words with pleasant concepts, and the insect terms with unpleasant ideas.

The Princeton team devised an experiment with a program where it essentially functioned like a machine learning version of the Implicit Association Test. Called GloVe, and developed by Stanford University researchers, the popular, open-source program is of the sort that a startup machine learning company might use at the heart of its product. The GloVe algorithm can represent the co-occurrence statistics of words in, say, a 10-word window of text. Words that often appear near one another have a stronger association than those words that seldom do.

The Stanford researchers turned GloVe loose on a huge trawl of contents from the World Wide Web, containing 840 billion words. Within this large sample of written human culture, Narayanan and colleagues then examined sets of so-called target words, like “programmer, engineer, scientist” and “nurse, teacher, librarian” alongside two sets of attribute words, such as “man, male” and “woman, female,” looking for evidence of the kinds of biases humans can unwittingly possess.

In the results, innocent, inoffensive biases, like for flowers over bugs, showed up, but so did examples along lines of gender and race. As it turned out, the Princeton machine learning experiment managed to replicate the broad substantiations of bias found in select Implicit Association Test studies over the years that have relied on live, human subjects.

For instance, the machine learning program associated female names more with familial attribute words, like “parents” and “wedding,” than male names. In turn, male names had stronger associations with career attributes, like “professional” and “salary.” Of course, results such as these are often just objective reflections of the true, unequal distributions of occupation types with respect to gender–like how 77 percent of computer programmers are male, according to the U.S. Bureau of Labor Statistics.

Yet this correctly distinguished bias about occupations can end up having pernicious, sexist effects. An example: when foreign languages are naively processed by machine learning programs, leading to gender-stereotyped sentences. The Turkish language uses a gender-neutral, third person pronoun, “o.” Plugged into the well-known, online translation service Google Translate, however, the Turkish sentences “o bir doktor” and “o bir hem?ire” with this gender-neutral pronoun are translated into English as “he is a doctor” and “she is a nurse.”

“This paper reiterates the important point that machine learning methods are not ‘objective’ or ‘unbiased’ just because they rely on mathematics and algorithms,” said Hanna Wallach, a senior researcher at Microsoft Research New York City, who was not involved in the study. “Rather, as long as they are trained using data from society and as long as society exhibits biases, these methods will likely reproduce these biases.”

Another objectionable example harkens back to a well-known 2004 paper by Marianne Bertrand of the University of Chicago Booth School of Business and Sendhil Mullainathan of Harvard University. The economists sent out close to 5,000 identical resumes to 1,300 job advertisements, changing only the applicants’ names to be either traditionally European American or African American. The former group was 50 percent more likely to be offered an interview than the latter. In an apparent corroboration of this bias, the new Princeton study demonstrated that a set of African American names had more unpleasantness associations than a European American set.

Computer programmers might hope to prevent cultural stereotype perpetuation through the development of explicit, mathematics-based instructions for the machine learning programs underlying AI systems. Not unlike how parents and mentors try to instill concepts of fairness and equality in children and students, coders could endeavor to make machines reflect the better angels of human nature.

“The biases that we studied in the paper are easy to overlook when designers are creating systems,” said Narayanan. “The biases and stereotypes in our society reflected in our language are complex and longstanding. Rather than trying to sanitize or eliminate them, we should treat biases as part of the language and establish an explicit way in machine learning of determining what we consider acceptable and unacceptable.”

Here’s a link to and a citation for the Princeton paper,

Semantics derived automatically from language corpora contain human-like biases by Aylin Caliskan, Joanna J. Bryson, Arvind Narayanan. Science  14 Apr 2017: Vol. 356, Issue 6334, pp. 183-186 DOI: 10.1126/science.aal4230

This paper appears to be open access.

Links to more cautionary posts about AI,

Aug 5, 2009: Autonomous algorithms; intelligent windows; pretty nano pictures

June 14, 2016:  Accountability for artificial intelligence decision-making

Oct. 25, 2016 Removing gender-based stereotypes from algorithms

March 1, 2017: Algorithms in decision-making: a government inquiry in the UK

There’s also a book which makes some of the current use of AI programmes and big data quite accessible reading: Cathy O’Neil’s ‘Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy’.

Big data in the Cascadia region: a University of British Columbia (Canada) and University of Washington (US state) collaboration

Before moving onto the news and for anyone unfamiliar with the concept of the Cascadia region, it is an informally proposed political region or a bioregion, depending on your perspective. Adding to the lack of clarity, the region generally includes the province of British Columbia in Canada and the two US states, Washington and Oregon but Alaska (another US state) and the Yukon (a Canadian territory) may also be included, as well as, parts of California, Wyoming, Idaho, and Montana. (You can read more about the Cascadia bioregion here and the proposed political region here.)  While it sounds as if more of the US is part of the ‘Cascadia region’, British Columbia and the Yukon cover considerably more territory than all of the mentioned states combined, if you’re taking a landmass perspective.

Cascadia Urban Analytics Cooperative

There was some big news about the smallest version of the Cascadia region on Thursday, Feb. 23, 2017 when the University of British Columbia (UBC) , the University of Washington (state; UW), and Microsoft announced the launch of the Cascadia Urban Analytics Cooperative. From the joint Feb. 23, 2017 news release (read on the UBC website or read on the UW website),

In an expansion of regional cooperation, the University of British Columbia and the University of Washington today announced the establishment of the Cascadia Urban Analytics Cooperative to use data to help cities and communities address challenges from traffic to homelessness. The largest industry-funded research partnership between UBC and the UW, the collaborative will bring faculty, students and community stakeholders together to solve problems, and is made possible thanks to a $1-million gift from Microsoft.

“Thanks to this generous gift from Microsoft, our two universities are poised to help transform the Cascadia region into a technological hub comparable to Silicon Valley and Boston,” said Professor Santa J. Ono, President of the University of British Columbia. “This new partnership transcends borders and strives to unleash our collective brain power, to bring about economic growth that enriches the lives of Canadians and Americans as well as urban communities throughout the world.”

“We have an unprecedented opportunity to use data to help our communities make decisions, and as a result improve people’s lives and well-being. That commitment to the public good is at the core of the mission of our two universities, and we’re grateful to Microsoft for making a community-minded contribution that will spark a range of collaborations,” said UW President Ana Mari Cauce.

Today’s announcement follows last September’s [2016] Emerging Cascadia Innovation Corridor Conference in Vancouver, B.C. The forum brought together regional leaders for the first time to identify concrete opportunities for partnerships in education, transportation, university research, human capital and other areas.

A Boston Consulting Group study unveiled at the conference showed the region between Seattle and Vancouver has “high potential to cultivate an innovation corridor” that competes on an international scale, but only if regional leaders work together. The study says that could be possible through sustained collaboration aided by an educated and skilled workforce, a vibrant network of research universities and a dynamic policy environment.

Microsoft President Brad Smith, who helped convene the conference, said, “We believe that joint research based on data science can help unlock new solutions for some of the most pressing issues in both Vancouver and Seattle. But our goal is bigger than this one-time gift. We hope this investment will serve as a catalyst for broader and more sustainable efforts between these two institutions.”

As part of the Emerging Cascadia conference, British Columbia Premier Christy Clark and Washington Governor Jay Inslee signed a formal agreement that committed the two governments to work closely together to “enhance meaningful and results-driven innovation and collaboration.”  The agreement outlined steps the two governments will take to collaborate in several key areas including research and education.

“Increasingly, tech is not just another standalone sector of the economy, but fully integrated into everything from transportation to social work,” said Premier Clark. “That’s why we’ve invested in B.C.’s thriving tech sector, but committed to working with our neighbours in Washington – and we’re already seeing the results.”

“This data-driven collaboration among some of our smartest and most creative thought-leaders will help us tackle a host of urgent issues,” Gov. Inslee said. “I’m encouraged to see our partnership with British Columbia spurring such interesting cross-border dialogue and excited to see what our students and researchers come up with.”

The Cascadia Urban Analytics Cooperative will revolve around four main programs:

  • The Cascadia Data Science for Social Good (DSSG) Summer Program, which builds on the success of the DSSG program at the UW eScience Institute. The cooperative will coordinate a joint summer program for students across UW and UBC campuses where they work with faculty to create and incubate data-intensive research projects that have concrete benefits for urban communities. One past DSSG project analyzed data from Seattle’s regional transportation system – ORCA – to improve its effectiveness, particularly for low-income transit riders. Another project sought to improve food safety by text mining product reviews to identify unsafe products.
  • Cascadia Data Science for Social Good Scholar Symposium, which will foster innovation and collaboration by bringing together scholars from UBC and the UW involved in projects utilizing technology to advance the social good. The first symposium will be hosted at UW in 2017.
  • Sustained Research Partnerships designed to establish the Pacific Northwest as a center of expertise and activity in urban analytics. The cooperative will support sustained research partnerships between UW and UBC researchers, providing technical expertise, stakeholder engagement and seed funding.
  • Responsible Data Management Systems and Services to ensure data integrity, security and usability. The cooperative will develop new software, systems and services to facilitate data management and analysis, as well as ensure projects adhere to best practices in fairness, accountability and transparency.

At UW, the Cascadia Urban Analytics Collaborative will be overseen by Urbanalytics (urbanalytics.uw.edu), a new research unit in the Information School focused on responsible urban data science. The Collaborative builds on previous investments in data-intensive science through the UW eScience Institute (escience.washington.edu) and investments in urban scholarship through Urban@UW (urban.uw.edu), and also aligns with the UW’s Population Health Initiative (uw.edu/populationhealth) that is addressing the most persistent and emerging challenges in human health, environmental resiliency and social and economic equity. The gift counts toward the UW’s Be Boundless – For Washington, For the World campaign (uw.edu/boundless).

The Collaborative also aligns with the UBC Sustainability Initiative (sustain.ubc.ca) that fosters partnerships beyond traditional boundaries of disciplines, sectors and geographies to address critical issues of our time, as well as the UBC Data Science Institute (dsi.ubc.ca), which aims to advance data science research to address complex problems across domains, including health, science and arts.

Brad Smith, President and Chief Legal Officer of Microsoft, wrote about the joint centre in a Feb. 23, 2017 posting on the Microsoft on the Issues blog (Note:,

The cities of Vancouver and Seattle share many strengths: a long history of innovation, world-class universities and a region rich in cultural and ethnic diversity. While both cities have achieved great success on their own, leaders from both sides of the border realize that tighter partnership and collaboration, through the creation of a Cascadia Innovation Corridor, will expand economic opportunity and prosperity well beyond what each community can achieve separately.

Microsoft supports this vision and today is making a $1 million investment in the Cascadia Urban Analytics Cooperative (CUAC), which is a new joint effort by the University of British Columbia (UBC) and the University of Washington (UW).  It will use data to help local cities and communities address challenges from traffic to homelessness and will be the region’s single largest university-based, industry-funded joint research project. While we recognize the crucial role that universities play in building great companies in the Pacific Northwest, whether it be in computing, life sciences, aerospace or interactive entertainment, we also know research, particularly data science, holds the key to solving some of Vancouver and Seattle’s most pressing issues. This grant will advance this work.

An Oct. 21, 2016 article by Hana Golightly for the Ubyssey newspaper provides a little more detail about the province/state agreement mentioned in the joint UBC/UW news release,

An agreement between BC Premier Christy Clark and Washington Governor Jay Inslee means UBC will be collaborating with the University of Washington (UW) more in the future.

At last month’s [Sept. 2016] Cascadia Conference, Clark and Inslee signed a Memorandum of Understanding with the goal of fostering the growth of the technology sector in both regions. Officially referred to as the Cascadia Innovation Corridor, this partnership aims to reduce boundaries across the region — economic and otherwise.

While the memorandum provides broad goals and is not legally binding, it sets a precedent of collaboration between businesses, governments and universities, encouraging projects that span both jurisdictions. Aiming to capitalize on the cultural commonalities of regional centres Seattle and Vancouver, the agreement prioritizes development in life sciences, clean technology, data analytics and high tech.

Metropolitan centres like Seattle and Vancouver have experienced a surge in growth that sees planners envisioning them as the next Silicon Valleys. Premier Clark and Governor Inslee want to strengthen the ability of their jurisdictions to compete in innovation on a global scale. Accordingly, the memorandum encourages the exploration of “opportunities to advance research programs in key areas of innovation and future technologies among the region’s major universities and institutes.”

A few more questions about the Cooperative

I had a few more questions about the Feb. 23, 2017 announcement, for which (from UBC) Gail C. Murphy, PhD, FRSC, Associate Vice President Research pro tem, Professor, Computer Science of UBC and (from UW) Bill Howe, Associate Professor, Information School, Adjunct Associate Professor, Computer Science & Engineering, Associate Director and Senior Data Science Fellow,, UW eScience Institute Program Director and Faculty Chair, UW Data Science Masters Degree have kindly provided answers (Gail Murphy’s replies are prefaced with [GM] and one indent and Bill Howe’s replies are prefaced with [BH] and two indents),

  • Do you have any projects currently underway? e.g. I see a summer programme is planned. Will there be one in summer 2017? What focus will it have?

[GM] UW and UBC will each be running the Data Science for Social Good program in the summer of 2017. UBC’s announcement of the program is available at: http://dsi.ubc.ca/data-science-social-good-dssg-fellowships

  • Is the $1M from Microsoft going to be given in cash or as ‘in kind goods’ or some combination?

[GM] The $1-million donation is in cash. Microsoft organized the Emerging Cascadia Innovation Corridor Conference in September 2017. It was at the conference that the idea for the partnership was hatched. Through this initiative, UBC and UW will continue to engage with Microsoft to further shared goals in promoting evidence-based innovation to improve life for people in the Cascadia region and beyond.

  • How will the money or goods be disbursed? e.g. Will each institution get 1/2 or is there some sort of joint account?

[GM] The institutions are sharing the funds but will be separately administering the funds they receive.

  • Is data going to be crossing borders? e.g. You mentioned some health care projects. In that case, will data from BC residents be accessed and subject to US rules and regulations? Will BC residents know that there data is being accessed by a 3rd party? What level of consent is required?

[GM] As you point out, there are many issues involved with transferring data across the border. Any projects involving private data will adhere to local laws and ethical frameworks set out by the institutions.

  • Privacy rules vary greatly between the US and Canada. How is that being addressed in this proposed new research?

[No Reply]

  • Will new software and other products be created and who will own them?

[GM] It is too soon for us to comment on whether new software or other products will be created. Any creation of software or other products within the institutions will be governed by institutional policy.

  • Will the research be made freely available?

[GM] UBC researchers must be able to publish the results of research as set out by UBC policy.

[BH] Research output at UW will be made available according to UW policy, but I’ll point out that Microsoft has long been a fantastic partner in advancing our efforts in open and reproducible science, open source software, and open access publishing. 

 UW’s discussion on open access policies is available online.

 

  • What percentage of public funds will be used to enable this project? Will the province of BC and the state of Washington be splitting the costs evenly?

[GM] It is too soon for us to report on specific percentages. At UBC, we will be looking to partner with appropriate funding agencies to support more research with this donation. Applications to funding agencies will involve review of any proposals as per the rules of the funding agency.

  • Will there be any social science and/or ethics component to this collaboration? The press conference referenced data science only.

[GM] We expect, but cannot yet confirm, that some of the projects will involve collaborations with faculty from a broad range of research areas at UBC.

[BH] We are indeed placing a strong emphasis on the intersection between data science, the social sciences, and data ethics.  As examples of activities in this space around UW:

* The Information School at UW (my home school) is actively recruiting a new faculty candidate in data ethics this year

* The Education Working Group at the eScience Institute has created a new campus-wide Data & Society seminar course.

* The Center for Statistics in the Social Sciences (CSSS), which represents the marriage of data science and the social sciences, has been a long-term partner in our activities.

More specifically for this collaboration, we are collecting requirements for new software that emphasizes responsible data science: properly managing sensitive data, combating algorithmic bias, protecting privacy, and more.

Microsoft has been a key partner in this work through their Civic Technology group, for which the Seattle arm is led by Graham Thompson.

  • What impact do you see the new US federal government’s current concerns over borders and immigrants hav[ing] on this project? e.g. Are people whose origins are in Iran, Syria, Yemen, etc. and who are residents of Canada going to be able to participate?

[GM] Students and others eligible to participate in research projects in Canada will be welcomed into the UBC projects. Our hope is that faculty and students working on the Cascadia Urban Analytics Cooperative will be able to exchange ideas freely and move freely back and forth across the border.

  • How will seed funding for Sustained Research Partnerships’ be disbursed? Will there be a joint committee making these decisions?

[GM] We are in the process of elaborating this part of the program. At UBC, we are already experiencing, enjoying and benefitting from increased interaction with the University of Washington and look forward to elaborating more aspects of the program together as the year unfolds.

I had to make a few formatting changes when transferring the answers from emails to this posting: my numbered questions (1-11) became bulleted points and ‘have’ in what was question 10 was changed to ‘having’. The content for the answers has been untouched.

I’m surprised no one answered the privacy question but perhaps they thought the other answers sufficed. Despite an answer to my question *about the disbursement of funds*, I don’t understand how the universities are sharing the funds but that may just mean I’m having a bad day. (Or perhaps the folks at UBC are being overly careful after the scandals rocking the Vancouver campus over the last 18 months to two years (see Sophie Sutcliffe’s Dec. 3, 2015 opinion piece for the Ubyssey for details about the scandals).

Bill Howe’s response about open access (where you can read the journal articles for free) and open source (where you have free access to the software code) was interesting to me as I once worked for a company where the developers complained loud and long about Microsoft’s failure to embrace open source code. Howe’s response is particularly interesting given that Microsoft’s president is also the Chief Legal Officer whose portfolio of responsibilities (I imagine) includes patents.

Matt Day in a Feb. 23, 2017 article for the The Seattle Times provides additional perspective (Note: Links have been removed),

Microsoft’s effort to nudge Seattle and Vancouver, B.C., a bit closer together got an endorsement Thursday [Feb. 23, 2017] from the leading university in each city.

The University of Washington and the University of British Columbia announced the establishment of a joint data-science research unit, called the Cascadia Urban Analytics Cooperative, funded by a $1 million grant from Microsoft.

The collaboration will support study of shared urban issues, from health to transit to homelessness, drawing on faculty and student input from both universities.

The partnership has its roots in a September [2016] conference in Vancouver organized by Microsoft’s public affairs and lobbying unit [emphasis mine.] That gathering was aimed at tying business, government and educational institutions in Microsoft’s home region in the Seattle area closer to its Canadian neighbor.

Microsoft last year [2016]* opened an expanded office in downtown Vancouver with space for 750 employees, an outpost partly designed to draw to the Northwest more engineers than the company can get through the U.S. guest worker system [emphasis mine].

There’s nothing wrong with a business offering to contribute to the social good but it does well to remember that a business’s primary agenda is not the social good.  So in this case, it seems that public affairs and lobbying is really governmental affairs and that Microsoft has anticipated, for some time, greater difficulties with getting workers from all sorts of countries across the US border to work in Washington state making an outpost in British Columbia and closer relations between the constituencies quite advantageous. I wonder what else is on their agenda.

Getting back to UBC and UW, thank you to both Gail Murphy (in particular) and Bill Howe for taking the time to answer my questions. I very much appreciate it as answering 10 questions is a lot of work.

There were one area of interest (cities) that I did not broach with the either academic but will mention here.

Cities and their increasing political heft

Clearly Microsoft is focused on urban issues and that would seem to be the ‘flavour du jour’. There’s a May 31, 2016 piece on the TED website by Robert Muggah and Benjamin Fowler titled: ‘Why cities rule the world‘ (there are video talks embedded in the piece),

Cities are the the 21st century’s dominant form of civilization — and they’re where humanity’s struggle for survival will take place. Robert Muggah and Benjamin Barber spell out the possibilities.

Half the planet’s population lives in cities. They are the world’s engines, generating four-fifths of the global GDP. There are over 2,100 cities with populations of 250,000 people or more, including a growing number of mega-cities and sprawling, networked-city areas — conurbations, they’re called — with at least 10 million residents. As the economist Ed Glaeser puts it, “we are an urban species.”

But what makes cities so incredibly important is not just population or economics stats. Cities are humanity’s most realistic hope for future democracy to thrive, from the grassroots to the global. This makes them a stark contrast to so many of today’s nations, increasingly paralyzed by polarization, corruption and scandal.

In a less hyperbolic vein, Parag Khanna’s April 20,2016 piece for Quartz describes why he (and others) believe that megacities are where the future lies (Note: A link has been removed),

Cities are mankind’s most enduring and stable mode of social organization, outlasting all empires and nations over which they have presided. Today cities have become the world’s dominant demographic and economic clusters.

As the sociologist Christopher Chase-Dunn has pointed out, it is not population or territorial size that drives world-city status, but economic weight, proximity to zones of growth, political stability, and attractiveness for foreign capital. In other words, connectivity matters more than size. Cities thus deserve more nuanced treatment on our maps than simply as homogeneous black dots.

Within many emerging markets such as Brazil, Turkey, Russia, and Indonesia, the leading commercial hub or financial center accounts for at least one-third or more of national GDP. In the UK, London accounts for almost half Britain’s GDP. And in America, the Boston-New York-Washington corridor and greater Los Angeles together combine for about one-third of America’s GDP.

By 2025, there will be at least 40 such megacities. The population of the greater Mexico City region is larger than that of Australia, as is that of Chongqing, a collection of connected urban enclaves in China spanning an area the size of Austria. Cities that were once hundreds of kilometers apart have now effectively fused into massive urban archipelagos, the largest of which is Japan’s Taiheiyo Belt that encompasses two-thirds of Japan’s population in the Tokyo-Nagoya-Osaka megalopolis.

Great and connected cities, Saskia Sassen argues, belong as much to global networks as to the country of their political geography. Today the world’s top 20 richest cities have forged a super-circuit driven by capital, talent, and services: they are home to more than 75% of the largest companies, which in turn invest in expanding across those cities and adding more to expand the intercity network. Indeed, global cities have forged a league of their own, in many ways as denationalized as Formula One racing teams, drawing talent from around the world and amassing capital to spend on themselves while they compete on the same circuit.

The rise of emerging market megacities as magnets for regional wealth and talent has been the most significant contributor to shifting the world’s focal point of economic activity. McKinsey Global Institute research suggests that from now until 2025, one-third of world growth will come from the key Western capitals and emerging market megacities, one-third from the heavily populous middle-weight cities of emerging markets, and one-third from small cities and rural areas in developing countries.

Khanna’s megacities all exist within one country. If Vancouver and Seattle (and perhaps Portland?) were to become a become a megacity it would be one of the only or few to cross national borders.

Khanna has been mentioned here before in a Jan. 27, 2016 posting about cities and technology and a public engagement exercise with the National Research of Council of Canada (scroll down to the subsection titled: Cities rising in important as political entities).

Muggah/Fowler’s and Khanna’s 2016 pieces are well worth reading if you have the time.

For what it’s worth, I’m inclined to agree that cities will be and are increasing in political  importance along with this area of development:

Algorithms and big data

Concerns are being raised about how big data is being utilized so I was happy to see specific initiatives to address ethics issues in Howe’s response. For anyone not familiar with the concerns, here’s an excerpt from Cathy O’Neil’s Oct. 18, 2016 article for Wired magazine,

The age of Big Data has generated new tools and ideas on an enormous scale, with applications spreading from marketing to Wall Street, human resources, college admissions, and insurance. At the same time, Big Data has opened opportunities for a whole new class of professional gamers and manipulators, who take advantage of people using the power of statistics.

I should know. I was one of them.

Information is power, and in the age of corporate surveillance, profiles on every active American consumer means that the system is slanted in favor of those with the data. This data helps build tailor-made profiles that can be used for or against someone in a given situation. Insurance companies, which historically sold car insurance based on driving records, have more recently started using such data-driven profiling methods. A Florida insurance company has been found to charge people with low credit scores and good driving records more than people with high credit scores and a drunk driving conviction. It’s become standard practice for insurance companies to charge people not what they represent as a risk, but what they can get away with. The victims, of course, are those least likely to be able to afford the extra cost, but who need a car to get to work.

Big data profiling techniques are exploding in the world of politics. It’s estimated that over $1 billion will be spent on digital political ads in this election cycle, almost 50 times as much as was spent in 2008; this field is a growing part of the budget for presidential as well as down-ticket races. Political campaigns build scoring systems on potential voters—your likelihood of voting for a given party, your stance on a given issue, and the extent to which you are persuadable on that issue. It’s the ultimate example of asymmetric information, and the politicians can use what they know to manipulate your vote or your donation.

I highly recommend reading O’Neil’s article and, if you have the time, her book ‘Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy’.

Finally

I look forward to hearing more about the Cascadia Urban Analytics Cooperative and the Cascadia Innovation Corridor as they develop. This has the potential to be very exciting although I do have some concerns such as MIcrosoft and its agendas, both stated and unstated. After all, the Sept. 2016 meeting was convened by Microsoft and its public affairs/lobbying group and the topic was innovation, which is code for business and as hinted earlier, business is not synonymous with social good. Having said that I’m not about to demonize business either. I just think a healthy dose of skepticism is called for. Good things can happen but we need to ensure they do.

Thankfully, my concerns regarding algorithms and big data seem to be shared in some quarters, unfortunately none of these quarters appear to be located at the University of British Columbia. I hope that’s over caution with regard to communication rather than a failure to recognize any pitfalls.

ETA Mar. 1, 2017: Interestingly, the UK House of Commons Select Committee on Science and Technology announced an inquiry into the use of algorithms in public and business decision-making on Feb. 28, 2017. As this posting as much too big already, I’ve posted about the UK inquire separately in a Mar. 1, 2017 posting.

*’2016′ added for clarity on March 24, 2017.

*’disbursement of funds’ added for clarity on Sept. 21, 2017.

Sniffing out disease (Na-Nose)

The ‘artificial nose’ is not a newcomer to this blog. The most recent post prior to this is a March 15, 2016 piece about Disney using an artificial nose for art conservation. Today’s (Jan. 9, 2016) piece concerns itself with work from Israel and ‘sniffing out’ disease, according to a Dec. 30, 2016 news item in Sputnik News,

A team from the Israel Institute of Technology has developed a device that from a single breath can identify diseases such as multiple forms of cancer, Parkinson’s disease, and multiple sclerosis. While the machine is still in the experimental stages, it has a high degree of promise for use in non-invasive diagnoses of serious illnesses.

The international team demonstrated that a medical theory first proposed by the Greek physician Hippocrates some 2400 years ago is true, certain diseases leave a “breathprint” on the exhalations of those afflicted. The researchers created a prototype for a machine that can pick up on those diseases using the outgoing breath of a patient. The machine, called the Na-Nose, tests breath samples for the presence of trace amounts of chemicals that are indicative of 17 different illnesses.

A Dec. 22, 2016 Technion Israel Institute of Technology press release offers more detail about the work,

An international team of 56 researchers in five countries has confirmed a hypothesis first proposed by the ancient Greeks – that different diseases are characterized by different “chemical signatures” identifiable in breath samples. …

Diagnostic techniques based on breath samples have been demonstrated in the past, but until now, there has not been scientific proof of the hypothesis that different and unrelated diseases are characterized by distinct chemical breath signatures. And technologies developed to date for this type of diagnosis have been limited to detecting a small number of clinical disorders, without differentiation between unrelated diseases.

The study of more than 1,400 patients included 17 different and unrelated diseases: lung cancer, colorectal cancer, head and neck cancer, ovarian cancer, bladder cancer, prostate cancer, kidney cancer, stomach cancer, Crohn’s disease, ulcerative colitis, irritable bowel syndrome, Parkinson’s disease (two types), multiple sclerosis, pulmonary hypertension, preeclampsia and chronic kidney disease. Samples were collected between January 2011 and June 2014 from in 14 departments at 9 medical centers in 5 countries: Israel, France, the USA, Latvia and China.

The researchers tested the chemical composition of the breath samples using an accepted analytical method (mass spectrometry), which enabled accurate quantitative detection of the chemical compounds they contained. 13 chemical components were identified, in different compositions, in all 17 of the diseases.

According to Prof. Haick, “each of these diseases is characterized by a unique fingerprint, meaning a different composition of these 13 chemical components.  Just as each of us has a unique fingerprint that distinguishes us from others, each disease has a chemical signature that distinguishes it from other diseases and from a normal state of health. These odor signatures are what enables us to identify the diseases using the technology that we developed.”

With a new technology called “artificially intelligent nanoarray,” developed by Prof. Haick, the researchers were able to corroborate the clinical efficacy of the diagnostic technology. The array enables fast and inexpensive diagnosis and classification of diseases, based on “smelling” the patient’s breath, and using artificial intelligence to analyze the data obtained from the sensors. Some of the sensors are based on layers of gold nanoscale particles and others contain a random network of carbon nanotubes coated with an organic layer for sensing and identification purposes.

The study also assessed the efficiency of the artificially intelligent nanoarray in detecting and classifying various diseases using breath signatures. To verify the reliability of the system, the team also examined the effect of various factors (such as gender, age, smoking habits and geographic location) on the sample composition, and found their effect to be negligible, and without impairment on the array’s sensitivity.

“Each of the sensors responds to a wide range of exhalation components,” explain Prof. Haick and his previous Ph.D student, Dr. Morad Nakhleh, “and integration of the information provides detailed data about the unique breath signatures characteristic of the various diseases. Our system has detected and classified various diseases with an average accuracy of 86%.

This is a new and promising direction for diagnosis and classification of diseases, which is characterized not only by considerable accuracy but also by low cost, low electricity consumption, miniaturization, comfort and the possibility of repeating the test easily.”

“Breath is an excellent raw material for diagnosis,” said Prof. Haick. “It is available without the need for invasive and unpleasant procedures, it’s not dangerous, and you can sample it again and again if necessary.”

Here’s a schematic of the study, which the researchers have made available,

Diagram: A schematic view of the study. Two breath samples were taken from each subject, one was sent for chemical mapping using mass spectrometry, and the other was analyzed in the new system, which produced a clinical diagnosis based on the chemical fingerprint of the breath sample. Courtesy: Tech;nion

There is also a video, which covers much of the same ground as the press release but also includes information about the possible use of the Na-Nose technology in the European Union’s SniffPhone project,

Here’s a link to and a citation for the paper,

Diagnosis and Classification of 17 Diseases from 1404 Subjects via Pattern Analysis of Exhaled Molecules by Morad K. Nakhleh, Haitham Amal, Raneen Jeries, Yoav Y. Broza, Manal Aboud, Alaa Gharra, Hodaya Ivgi, Salam Khatib, Shifaa Badarneh, Lior Har-Shai, Lea Glass-Marmor, Izabella Lejbkowicz, Ariel Miller, Samih Badarny, Raz Winer, John Finberg, Sylvia Cohen-Kaminsky, Frédéric Perros, David Montani, Barbara Girerd, Gilles Garcia, Gérald Simonneau, Farid Nakhoul, Shira Baram, Raed Salim, Marwan Hakim, Maayan Gruber, Ohad Ronen, Tal Marshak, Ilana Doweck, Ofer Nativ, Zaher Bahouth, Da-you Shi, Wei Zhang, Qing-ling Hua, Yue-yin Pan, Li Tao, Hu Liu, Amir Karban, Eduard Koifman, Tova Rainis, Roberts Skapars, Armands Sivins, Guntis Ancans, Inta Liepniece-Karele, Ilze Kikuste, Ieva Lasina, Ivars Tolmanis, Douglas Johnson, Stuart Z. Millstone, Jennifer Fulton, John W. Wells, Larry H. Wilf, Marc Humbert, Marcis Leja, Nir Peled, and Hossam Haick. ACS Nano, Article ASAP DOI: 10.1021/acsnano.6b04930 Publication Date (Web): December 21, 2016

Copyright © 2017 American Chemical Society

This paper appears to be open access.

As for SniffPhone, they’re hoping that Na-Nose or something like it will allow them to modify smartphones in a way that will allow diseases to be detected.

I can’t help wondering who will own the data if your smartphone detects a disease. If you think that’s an idle question, here’s an excerpt from Sue Halpern’s Dec. 22, 2016 review of two books (“Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy” by Cathy O’Neil and “Virtual Competition: The Promise and Perils of the Algorithm-Driven Economy” by Ariel Ezrachi and Maurice E. Stucke) for the New York Times Review of Books,

We give our data away. We give it away in drips and drops, not thinking that data brokers will collect it and sell it, let alone that it will be used against us. There are now private, unregulated DNA databases culled, in part, from DNA samples people supply to genealogical websites in pursuit of their ancestry. These samples are available online to be compared with crime scene DNA without a warrant or court order. (Police are also amassing their own DNA databases by swabbing cheeks during routine stops.) In the estimation of the Electronic Frontier Foundation, this will make it more likely that people will be implicated in crimes they did not commit.

Or consider the data from fitness trackers, like Fitbit. As reported in The Intercept:

During a 2013 FTC panel on “Connected Health and Fitness,” University of Colorado law professor Scott Peppet said, “I can paint an incredibly detailed and rich picture of who you are based on your Fitbit data,” adding, “That data is so high quality that I can do things like price insurance premiums or I could probably evaluate your credit score incredibly accurately.”

Halpern’s piece is well worth reading in its entirety.