Tag Archives: artificial intelligence systems

Being smart about using artificial intelligence in the field of medicine

Since my August 20, 2018 post featured an opinion piece about the possibly imminent replacement of radiologists with artificial intelligence systems and the latest research about employing them for diagnosing eye diseases, it seems like a good time to examine some of the mythology embedded in the discussion about AI and medicine.

Imperfections in medical AI systems

An August 15, 2018 article for Slate.com by W. Nicholson Price II (who teaches at the University of Michigan School of Law; in addition to his law degree he has a PhD in Biological Sciences from Columbia University) begins with the peppy, optimistic view before veering into more critical territory (Note: Links have been removed),

For millions of people suffering from diabetes, new technology enabled by artificial intelligence promises to make management much easier. Medtronic’s Guardian Connect system promises to alert users 10 to 60 minutes before they hit high or low blood sugar level thresholds, thanks to IBM Watson, “the same supercomputer technology that can predict global weather patterns.” Startup Beta Bionics goes even further: In May, it received Food and Drug Administration approval to start clinical trials on what it calls a “bionic pancreas system” powered by artificial intelligence, capable of “automatically and autonomously managing blood sugar levels 24/7.”

An artificial pancreas powered by artificial intelligence represents a huge step forward for the treatment of diabetes—but getting it right will be hard. Artificial intelligence (also known in various iterations as deep learning and machine learning) promises to automatically learn from patterns in medical data to help us do everything from managing diabetes to finding tumors in an MRI to predicting how long patients will live. But the artificial intelligence techniques involved are typically opaque. We often don’t know how the algorithm makes the eventual decision. And they may change and learn from new data—indeed, that’s a big part of the promise. But when the technology is complicated, opaque, changing, and absolutely vital to the health of a patient, how do we make sure it works as promised?

Price describes how a ‘closed loop’ artificial pancreas with AI would automate insulin levels for diabetic patients, flaws in the automated system, and how companies like to maintain a competitive advantage (Note: Links have been removed),

[…] a “closed loop” artificial pancreas, where software handles the whole issue, receiving and interpreting signals from the monitor, deciding when and how much insulin is needed, and directing the insulin pump to provide the right amount. The first closed-loop system was approved in late 2016. The system should take as much of the issue off the mind of the patient as possible (though, of course, that has limits). Running a close-loop artificial pancreas is challenging. The way people respond to changing levels of carbohydrates is complicated, as is their response to insulin; it’s hard to model accurately. Making it even more complicated, each individual’s body reacts a little differently.

Here’s where artificial intelligence comes into play. Rather than trying explicitly to figure out the exact model for how bodies react to insulin and to carbohydrates, machine learning methods, given a lot of data, can find patterns and make predictions. And existing continuous glucose monitors (and insulin pumps) are excellent at generating a lot of data. The idea is to train artificial intelligence algorithms on vast amounts of data from diabetic patients, and to use the resulting trained algorithms to run a closed-loop artificial pancreas. Even more exciting, because the system will keep measuring blood glucose, it can learn from the new data and each patient’s artificial pancreas can customize itself over time as it acquires new data from that patient’s particular reactions.

Here’s the tough question: How will we know how well the system works? Diabetes software doesn’t exactly have the best track record when it comes to accuracy. A 2015 study found that among smartphone apps for calculating insulin doses, two-thirds of the apps risked giving incorrect results, often substantially so. … And companies like to keep their algorithms proprietary for a competitive advantage, which makes it hard to know how they work and what flaws might have gone unnoticed in the development process.

There’s more,

These issues aren’t unique to diabetes care—other A.I. algorithms will also be complicated, opaque, and maybe kept secret by their developers. The potential for problems multiplies when an algorithm is learning from data from an entire hospital, or hospital system, or the collected data from an entire state or nation, not just a single patient. …

The [US Food and Drug Administraiont] FDA is working on this problem. The head of the agency has expressed his enthusiasm for bringing A.I. safely into medical practice, and the agency has a new Digital Health Innovation Action Plan to try to tackle some of these issues. But they’re not easy, and one thing making it harder is a general desire to keep the algorithmic sauce secret. The example of IBM Watson for Oncology has given the field a bit of a recent black eye—it turns out that the company knew the algorithm gave poor recommendations for cancer treatment but kept that secret for more than a year. …

While Price focuses on problems with algorithms and with developers and their business interests, he also hints at some of the body’s complexities.

Can AI systems be like people?

Susan Baxter, a medical writer with over 20 years experience, a PhD in health economics, and author of countless magazine articles and several books, offers a more person-centered approach to the discussion in her July 6, 2018 posting on susanbaxter.com,

The fascination with AI continues to irk, given that every second thing I read seems to be extolling the magic of AI and medicine and how It Will Change Everything. Which it will not, trust me. The essential issue of illness remains perennial and revolves around an individual for whom no amount of technology will solve anything without human contact. …

But in this world, or so we are told by AI proponents, radiologists will soon be obsolete. [my August 20, 2018 post] The adaptational learning capacities of AI mean that reading a scan or x-ray will soon be more ably done by machines than humans. The presupposition here is that we, the original programmers of this artificial intelligence, understand the vagaries of real life (and real disease) so wonderfully that we can deconstruct these much as we do the game of chess (where, let’s face it, Big Blue ate our lunch) and that analyzing a two-dimensional image of a three-dimensional body, already problematic, can be reduced to a series of algorithms.

Attempting to extrapolate what some “shadow” on a scan might mean in a flesh and blood human isn’t really quite the same as bishop to knight seven. Never mind the false positive/negatives that are considered an acceptable risk or the very real human misery they create.

Moravec called it

It’s called Moravec’s paradox, the inability of humans to realize just how complex basic physical tasks are – and the corresponding inability of AI to mimic it. As you walk across the room, carrying a glass of water, talking to your spouse/friend/cat/child; place the glass on the counter and open the dishwasher door with your foot as you open a jar of pickles at the same time, take a moment to consider just how many concurrent tasks you are doing and just how enormous the computational power these ostensibly simple moves would require.

Researchers in Singapore taught industrial robots to assemble an Ikea chair. Essentially, screw in the legs. A person could probably do this in a minute. Maybe two. The preprogrammed robots took nearly half an hour. And I suspect programming those robots took considerably longer than that.

Ironically, even Elon Musk, who has had major production problems with the Tesla cars rolling out of his high tech factory, has conceded (in a tweet) that “Humans are underrated.”

I wouldn’t necessarily go that far given the political shenanigans of Trump & Co. but in the grand scheme of things I tend to agree. …

Is AI going the way of gene therapy?

Susan draws a parallel between the AI and medicine discussion with the discussion about genetics and medicine (Note: Links have been removed),

On a somewhat similar note – given the extent to which genetics discourse has that same linear, mechanistic  tone [as AI and medicine] – it turns out all this fine talk of using genetics to determine health risk and whatnot is based on nothing more than clever marketing, since a lot of companies are making a lot of money off our belief in DNA. Truth is half the time we don’t even know what a gene is never mind what it actually does;  geneticists still can’t agree on how many genes there are in a human genome, as this article in Nature points out.

Along the same lines, I was most amused to read about something called the Super Seniors Study, research following a group of individuals in their 80’s, 90’s and 100’s who seem to be doing really well. Launched in 2002 and headed by Angela Brooks Wilson, a geneticist at the BC [British Columbia] Cancer Agency and SFU [Simon Fraser University] Chair of biomedical physiology and kinesiology, this longitudinal work is examining possible factors involved in healthy ageing.

Turns out genes had nothing to do with it, the title of the Globe and Mail article notwithstanding. (“Could the DNA of these super seniors hold the secret to healthy aging?” The answer, a resounding “no”, well hidden at the very [end], the part most people wouldn’t even get to.) All of these individuals who were racing about exercising and working part time and living the kind of life that makes one tired just reading about it all had the same “multiple (genetic) factors linked to a high probability of disease”. You know, the gene markers they tell us are “linked” to cancer, heart disease, etc., etc. But these super seniors had all those markers but none of the diseases, demonstrating (pretty strongly) that the so-called genetic links to disease are a load of bunkum. Which (she said modestly) I have been saying for more years than I care to remember. You’re welcome.

The fundamental error in this type of linear thinking is in allowing our metaphors (genes are the “blueprint” of life) and propensity towards social ideas of determinism to overtake common sense. Biological and physiological systems are not static; they respond to and change to life in its entirety, whether it’s diet and nutrition to toxic or traumatic insults. Immunity alters, endocrinology changes, – even how we think and feel affects the efficiency and effectiveness of physiology. Which explains why as we age we become increasingly dissimilar.

If you have the time, I encourage to read Susan’s comments in their entirety.

Scientific certainties

Following on with genetics, gene therapy dreams, and the complexity of biology, the June 19, 2018 Nature article by Cassandra Willyard (mentioned in Susan’s posting) highlights an aspect of scientific research not often mentioned in public,

One of the earliest attempts to estimate the number of genes in the human genome involved tipsy geneticists, a bar in Cold Spring Harbor, New York, and pure guesswork.

That was in 2000, when a draft human genome sequence was still in the works; geneticists were running a sweepstake on how many genes humans have, and wagers ranged from tens of thousands to hundreds of thousands. Almost two decades later, scientists armed with real data still can’t agree on the number — a knowledge gap that they say hampers efforts to spot disease-related mutations.

In 2000, with the genomics community abuzz over the question of how many human genes would be found, Ewan Birney launched the GeneSweep contest. Birney, now co-director of the European Bioinformatics Institute (EBI) in Hinxton, UK, took the first bets at a bar during an annual genetics meeting, and the contest eventually attracted more than 1,000 entries and a US$3,000 jackpot. Bets on the number of genes ranged from more than 312,000 to just under 26,000, with an average of around 40,000. These days, the span of estimates has shrunk — with most now between 19,000 and 22,000 — but there is still disagreement (See ‘Gene Tally’).

… the inconsistencies in the number of genes from database to database are problematic for researchers, Pruitt says. “People want one answer,” she [Kim Pruitt, a genome researcher at the US National Center for Biotechnology Information {NCB}] in Bethesda, Maryland] adds, “but biology is complex.”

I wanted to note that scientists do make guesses and not just with genetics. For example, Gina Mallet’s 2005 book ‘Last Chance to Eat: The Fate of Taste in a Fast Food World’ recounts the story of how good and bad levels of cholesterol were established—the experts made some guesses based on their experience. That said, Willyard’s article details the continuing effort to nail down the number of genes almost 20 years after the human genome project was completed and delves into the problems the scientists have uncovered.

Final comments

In addition to opaque processes with developers/entrepreneurs wanting to maintain their secrets for competitive advantages and in addition to our own poor understanding of the human body (how many genes are there anyway?), there are same major gaps (reflected in AI) in our understanding of various diseases. Angela Lashbrook’s August 16, 2018 article for The Atlantic highlights some issues with skin cancer and shade of your skin (Note: Links have been removed),

… While fair-skinned people are at the highest risk for contracting skin cancer, the mortality rate for African Americans is considerably higher: Their five-year survival rate is 73 percent, compared with 90 percent for white Americans, according to the American Academy of Dermatology.

As the rates of melanoma for all Americans continue a 30-year climb, dermatologists have begun exploring new technologies to try to reverse this deadly trend—including artificial intelligence. There’s been a growing hope in the field that using machine-learning algorithms to diagnose skin cancers and other skin issues could make for more efficient doctor visits and increased, reliable diagnoses. The earliest results are promising—but also potentially dangerous for darker-skinned patients.

… Avery Smith, … a software engineer in Baltimore, Maryland, co-authored a paper in JAMA [Journal of the American Medical Association] Dermatology that warns of the potential racial disparities that could come from relying on machine learning for skin-cancer screenings. Smith’s co-author, Adewole Adamson of the University of Texas at Austin, has conducted multiple studies on demographic imbalances in dermatology. “African Americans have the highest mortality rate [for skin cancer], and doctors aren’t trained on that particular skin type,” Smith told me over the phone. “When I came across the machine-learning software, one of the first things I thought was how it will perform on black people.”

Recently, a study that tested machine-learning software in dermatology, conducted by a group of researchers primarily out of Germany, found that “deep-learning convolutional neural networks,” or CNN, detected potentially cancerous skin lesions better than the 58 dermatologists included in the study group. The data used for the study come from the International Skin Imaging Collaboration, or ISIC, an open-source repository of skin images to be used by machine-learning algorithms. Given the rise in melanoma cases in the United States, a machine-learning algorithm that assists dermatologists in diagnosing skin cancer earlier could conceivably save thousands of lives each year.

… Chief among the prohibitive issues, according to Smith and Adamson, is that the data the CNN relies on come from primarily fair-skinned populations in the United States, Australia, and Europe. If the algorithm is basing most of its knowledge on how skin lesions appear on fair skin, then theoretically, lesions on patients of color are less likely to be diagnosed. “If you don’t teach the algorithm with a diverse set of images, then that algorithm won’t work out in the public that is diverse,” says Adamson. “So there’s risk, then, for people with skin of color to fall through the cracks.”

As Adamson and Smith’s paper points out, racial disparities in artificial intelligence and machine learning are not a new issue. Algorithms have mistaken images of black people for gorillas, misunderstood Asians to be blinking when they weren’t, and “judged” only white people to be attractive. An even more dangerous issue, according to the paper, is that decades of clinical research have focused primarily on people with light skin, leaving out marginalized communities whose symptoms may present differently.

The reasons for this exclusion are complex. According to Andrew Alexis, a dermatologist at Mount Sinai, in New York City, and the director of the Skin of Color Center, compounding factors include a lack of medical professionals from marginalized communities, inadequate information about those communities, and socioeconomic barriers to participating in research. “In the absence of a diverse study population that reflects that of the U.S. population, potential safety or efficacy considerations could be missed,” he says.

Adamson agrees, elaborating that with inadequate data, machine learning could misdiagnose people of color with nonexistent skin cancers—or miss them entirely. But he understands why the field of dermatology would surge ahead without demographically complete data. “Part of the problem is that people are in such a rush. This happens with any new tech, whether it’s a new drug or test. Folks see how it can be useful and they go full steam ahead without thinking of potential clinical consequences. …

Improving machine-learning algorithms is far from the only method to ensure that people with darker skin tones are protected against the sun and receive diagnoses earlier, when many cancers are more survivable. According to the Skin Cancer Foundation, 63 percent of African Americans don’t wear sunscreen; both they and many dermatologists are more likely to delay diagnosis and treatment because of the belief that dark skin is adequate protection from the sun’s harmful rays. And due to racial disparities in access to health care in America, African Americans are less likely to get treatment in time.

Happy endings

I’ll add one thing to Price’s article, Susan’s posting, and Lashbrook’s article about the issues with AI , certainty, gene therapy, and medicine—the desire for a happy ending prefaced with an easy solution. If the easy solution isn’t possible accommodations will be made but that happy ending is a must. All disease will disappear and there will be peace on earth. (Nod to Susan Baxter and her many discussions with me about disease processes and happy endings.)

The solutions, for the most part, are seen as technological despite the mountain of evidence suggesting that technology reflects our own imperfect understanding of health and disease therefore providing what is at best an imperfect solution.

Also, we tend to underestimate just how complex humans are not only in terms of disease and health but also with regard to our skills, understanding, and, perhaps not often enough, our ability to respond appropriately in the moment.

There is much to celebrate in what has been accomplished: no more black death, no more smallpox, hip replacements, pacemakers, organ transplants, and much more. Yes, we should try to improve our medicine. But, maybe alongside the celebration we can welcome AI and other technologies with a lot less hype and a lot more skepticism.

Hacking the human brain with a junction-based artificial synaptic device

Earlier today I published a piece featuring Dr. Wei Lu’s work on memristors and the movement to create an artificial brain (my June 28, 2017 posting: Dr. Wei Lu and bio-inspired ‘memristor’ chips). For this posting I’m featuring a non-memristor (if I’ve properly understood the technology) type of artificial synapse. From a June 28, 2017 news item on Nanowerk,

One of the greatest challenges facing artificial intelligence development is understanding the human brain and figuring out how to mimic it.

Now, one group reports in ACS Nano (“Emulating Bilingual Synaptic Response Using a Junction-Based Artificial Synaptic Device”) that they have developed an artificial synapse capable of simulating a fundamental function of our nervous system — the release of inhibitory and stimulatory signals from the same “pre-synaptic” terminal.

Unfortunately, the American Chemical Society news release on EurekAlert, which originated the news item, doesn’t provide too much more detail,

The human nervous system is made up of over 100 trillion synapses, structures that allow neurons to pass electrical and chemical signals to one another. In mammals, these synapses can initiate and inhibit biological messages. Many synapses just relay one type of signal, whereas others can convey both types simultaneously or can switch between the two. To develop artificial intelligence systems that better mimic human learning, cognition and image recognition, researchers are imitating synapses in the lab with electronic components. Most current artificial synapses, however, are only capable of delivering one type of signal. So, Han Wang, Jing Guo and colleagues sought to create an artificial synapse that can reconfigurably send stimulatory and inhibitory signals.

The researchers developed a synaptic device that can reconfigure itself based on voltages applied at the input terminal of the device. A junction made of black phosphorus and tin selenide enables switching between the excitatory and inhibitory signals. This new device is flexible and versatile, which is highly desirable in artificial neural networks. In addition, the artificial synapses may simplify the design and functions of nervous system simulations.

Here’s how I concluded that this is not a memristor-type device (from the paper [first paragraph, final sentence]; a link and citation will follow; Note: Links have been removed)),

The conventional memristor-type [emphasis mine](14-20) and transistor-type(21-25) artificial synapses can realize synaptic functions in a single semiconductor device but lacks the ability [emphasis mine] to dynamically reconfigure between excitatory and inhibitory responses without the addition of a modulating terminal.

Here’s a link to and a citation for the paper,

Emulating Bilingual Synaptic Response Using a Junction-Based Artificial Synaptic Device by
He Tian, Xi Cao, Yujun Xie, Xiaodong Yan, Andrew Kostelec, Don DiMarzio, Cheng Chang, Li-Dong Zhao, Wei Wu, Jesse Tice, Judy J. Cha, Jing Guo, and Han Wang. ACS Nano, Article ASAP DOI: 10.1021/acsnano.7b03033 Publication Date (Web): June 28, 2017

Copyright © 2017 American Chemical Society

This paper is behind a paywall.

An explanation of neural networks from the Massachusetts Institute of Technology (MIT)

I always enjoy the MIT ‘explainers’ and have been a little sad that I haven’t stumbled across one in a while. Until now, that is. Here’s an April 14, 201 neural network ‘explainer’ (in its entirety) by Larry Hardesty (?),

In the past 10 years, the best-performing artificial-intelligence systems — such as the speech recognizers on smartphones or Google’s latest automatic translator — have resulted from a technique called “deep learning.”

Deep learning is in fact a new name for an approach to artificial intelligence called neural networks, which have been going in and out of fashion for more than 70 years. Neural networks were first proposed in 1944 by Warren McCullough and Walter Pitts, two University of Chicago researchers who moved to MIT in 1952 as founding members of what’s sometimes called the first cognitive science department.

Neural nets were a major area of research in both neuroscience and computer science until 1969, when, according to computer science lore, they were killed off by the MIT mathematicians Marvin Minsky and Seymour Papert, who a year later would become co-directors of the new MIT Artificial Intelligence Laboratory.

The technique then enjoyed a resurgence in the 1980s, fell into eclipse again in the first decade of the new century, and has returned like gangbusters in the second, fueled largely by the increased processing power of graphics chips.

“There’s this idea that ideas in science are a bit like epidemics of viruses,” says Tomaso Poggio, the Eugene McDermott Professor of Brain and Cognitive Sciences at MIT, an investigator at MIT’s McGovern Institute for Brain Research, and director of MIT’s Center for Brains, Minds, and Machines. “There are apparently five or six basic strains of flu viruses, and apparently each one comes back with a period of around 25 years. People get infected, and they develop an immune response, and so they don’t get infected for the next 25 years. And then there is a new generation that is ready to be infected by the same strain of virus. In science, people fall in love with an idea, get excited about it, hammer it to death, and then get immunized — they get tired of it. So ideas should have the same kind of periodicity!”

Weighty matters

Neural nets are a means of doing machine learning, in which a computer learns to perform some task by analyzing training examples. Usually, the examples have been hand-labeled in advance. An object recognition system, for instance, might be fed thousands of labeled images of cars, houses, coffee cups, and so on, and it would find visual patterns in the images that consistently correlate with particular labels.

Modeled loosely on the human brain, a neural net consists of thousands or even millions of simple processing nodes that are densely interconnected. Most of today’s neural nets are organized into layers of nodes, and they’re “feed-forward,” meaning that data moves through them in only one direction. An individual node might be connected to several nodes in the layer beneath it, from which it receives data, and several nodes in the layer above it, to which it sends data.

To each of its incoming connections, a node will assign a number known as a “weight.” When the network is active, the node receives a different data item — a different number — over each of its connections and multiplies it by the associated weight. It then adds the resulting products together, yielding a single number. If that number is below a threshold value, the node passes no data to the next layer. If the number exceeds the threshold value, the node “fires,” which in today’s neural nets generally means sending the number — the sum of the weighted inputs — along all its outgoing connections.

When a neural net is being trained, all of its weights and thresholds are initially set to random values. Training data is fed to the bottom layer — the input layer — and it passes through the succeeding layers, getting multiplied and added together in complex ways, until it finally arrives, radically transformed, at the output layer. During training, the weights and thresholds are continually adjusted until training data with the same labels consistently yield similar outputs.

Minds and machines

The neural nets described by McCullough and Pitts in 1944 had thresholds and weights, but they weren’t arranged into layers, and the researchers didn’t specify any training mechanism. What McCullough and Pitts showed was that a neural net could, in principle, compute any function that a digital computer could. The result was more neuroscience than computer science: The point was to suggest that the human brain could be thought of as a computing device.

Neural nets continue to be a valuable tool for neuroscientific research. For instance, particular network layouts or rules for adjusting weights and thresholds have reproduced observed features of human neuroanatomy and cognition, an indication that they capture something about how the brain processes information.

The first trainable neural network, the Perceptron, was demonstrated by the Cornell University psychologist Frank Rosenblatt in 1957. The Perceptron’s design was much like that of the modern neural net, except that it had only one layer with adjustable weights and thresholds, sandwiched between input and output layers.

Perceptrons were an active area of research in both psychology and the fledgling discipline of computer science until 1959, when Minsky and Papert published a book titled “Perceptrons,” which demonstrated that executing certain fairly common computations on Perceptrons would be impractically time consuming.

“Of course, all of these limitations kind of disappear if you take machinery that is a little more complicated — like, two layers,” Poggio says. But at the time, the book had a chilling effect on neural-net research.

“You have to put these things in historical context,” Poggio says. “They were arguing for programming — for languages like Lisp. Not many years before, people were still using analog computers. It was not clear at all at the time that programming was the way to go. I think they went a little bit overboard, but as usual, it’s not black and white. If you think of this as this competition between analog computing and digital computing, they fought for what at the time was the right thing.”

Periodicity

By the 1980s, however, researchers had developed algorithms for modifying neural nets’ weights and thresholds that were efficient enough for networks with more than one layer, removing many of the limitations identified by Minsky and Papert. The field enjoyed a renaissance.

But intellectually, there’s something unsatisfying about neural nets. Enough training may revise a network’s settings to the point that it can usefully classify data, but what do those settings mean? What image features is an object recognizer looking at, and how does it piece them together into the distinctive visual signatures of cars, houses, and coffee cups? Looking at the weights of individual connections won’t answer that question.

In recent years, computer scientists have begun to come up with ingenious methods for deducing the analytic strategies adopted by neural nets. But in the 1980s, the networks’ strategies were indecipherable. So around the turn of the century, neural networks were supplanted by support vector machines, an alternative approach to machine learning that’s based on some very clean and elegant mathematics.

The recent resurgence in neural networks — the deep-learning revolution — comes courtesy of the computer-game industry. The complex imagery and rapid pace of today’s video games require hardware that can keep up, and the result has been the graphics processing unit (GPU), which packs thousands of relatively simple processing cores on a single chip. It didn’t take long for researchers to realize that the architecture of a GPU is remarkably like that of a neural net.

Modern GPUs enabled the one-layer networks of the 1960s and the two- to three-layer networks of the 1980s to blossom into the 10-, 15-, even 50-layer networks of today. That’s what the “deep” in “deep learning” refers to — the depth of the network’s layers. And currently, deep learning is responsible for the best-performing systems in almost every area of artificial-intelligence research.

Under the hood

The networks’ opacity is still unsettling to theorists, but there’s headway on that front, too. In addition to directing the Center for Brains, Minds, and Machines (CBMM), Poggio leads the center’s research program in Theoretical Frameworks for Intelligence. Recently, Poggio and his CBMM colleagues have released a three-part theoretical study of neural networks.

The first part, which was published last month in the International Journal of Automation and Computing, addresses the range of computations that deep-learning networks can execute and when deep networks offer advantages over shallower ones. Parts two and three, which have been released as CBMM technical reports, address the problems of global optimization, or guaranteeing that a network has found the settings that best accord with its training data, and overfitting, or cases in which the network becomes so attuned to the specifics of its training data that it fails to generalize to other instances of the same categories.

There are still plenty of theoretical questions to be answered, but CBMM researchers’ work could help ensure that neural networks finally break the generational cycle that has brought them in and out of favor for seven decades.

This image from MIT illustrates a ‘modern’ neural network,

Most applications of deep learning use “convolutional” neural networks, in which the nodes of each layer are clustered, the clusters overlap, and each cluster feeds data to multiple nodes (orange and green) of the next layer. Image: Jose-Luis Olivares/MIT

h/t phys.org April 17, 2017

One final note, I wish the folks at MIT had an ‘explainer’ archive. I’m not sure how to find any more ‘explainers on MIT’s website.

Machine learning programs learn bias

The notion of bias in artificial intelligence (AI)/algorithms/robots is gaining prominence (links to other posts featuring algorithms and bias are at the end of this post). The latest research concerns machine learning where an artificial intelligence system trains itself with ordinary human language from the internet. From an April 13, 2017 American Association for the Advancement of Science (AAAS) news release on EurekAlert,

As artificial intelligence systems “learn” language from existing texts, they exhibit the same biases that humans do, a new study reveals. The results not only provide a tool for studying prejudicial attitudes and behavior in humans, but also emphasize how language is intimately intertwined with historical biases and cultural stereotypes. A common way to measure biases in humans is the Implicit Association Test (IAT), where subjects are asked to pair two concepts they find similar, in contrast to two concepts they find different; their response times can vary greatly, indicating how well they associated one word with another (for example, people are more likely to associate “flowers” with “pleasant,” and “insects” with “unpleasant”). Here, Aylin Caliskan and colleagues developed a similar way to measure biases in AI systems that acquire language from human texts; rather than measuring lag time, however, they used the statistical number of associations between words, analyzing roughly 2.2 million words in total. Their results demonstrate that AI systems retain biases seen in humans. For example, studies of human behavior show that the exact same resume is 50% more likely to result in an opportunity for an interview if the candidate’s name is European American rather than African-American. Indeed, the AI system was more likely to associate European American names with “pleasant” stimuli (e.g. “gift,” or “happy”). In terms of gender, the AI system also reflected human biases, where female words (e.g., “woman” and “girl”) were more associated than male words with the arts, compared to mathematics. In a related Perspective, Anthony G. Greenwald discusses these findings and how they could be used to further analyze biases in the real world.

There are more details about the research in this April 13, 2017 Princeton University news release on EurekAlert (also on ScienceDaily),

In debates over the future of artificial intelligence, many experts think of the new systems as coldly logical and objectively rational. But in a new study, researchers have demonstrated how machines can be reflections of us, their creators, in potentially problematic ways. Common machine learning programs, when trained with ordinary human language available online, can acquire cultural biases embedded in the patterns of wording, the researchers found. These biases range from the morally neutral, like a preference for flowers over insects, to the objectionable views of race and gender.

Identifying and addressing possible bias in machine learning will be critically important as we increasingly turn to computers for processing the natural language humans use to communicate, for instance in doing online text searches, image categorization and automated translations.

“Questions about fairness and bias in machine learning are tremendously important for our society,” said researcher Arvind Narayanan, an assistant professor of computer science and an affiliated faculty member at the Center for Information Technology Policy (CITP) at Princeton University, as well as an affiliate scholar at Stanford Law School’s Center for Internet and Society. “We have a situation where these artificial intelligence systems may be perpetuating historical patterns of bias that we might find socially unacceptable and which we might be trying to move away from.”

The paper, “Semantics derived automatically from language corpora contain human-like biases,” published April 14  [2017] in Science. Its lead author is Aylin Caliskan, a postdoctoral research associate and a CITP fellow at Princeton; Joanna Bryson, a reader at University of Bath, and CITP affiliate, is a coauthor.

As a touchstone for documented human biases, the study turned to the Implicit Association Test, used in numerous social psychology studies since its development at the University of Washington in the late 1990s. The test measures response times (in milliseconds) by human subjects asked to pair word concepts displayed on a computer screen. Response times are far shorter, the Implicit Association Test has repeatedly shown, when subjects are asked to pair two concepts they find similar, versus two concepts they find dissimilar.

Take flower types, like “rose” and “daisy,” and insects like “ant” and “moth.” These words can be paired with pleasant concepts, like “caress” and “love,” or unpleasant notions, like “filth” and “ugly.” People more quickly associate the flower words with pleasant concepts, and the insect terms with unpleasant ideas.

The Princeton team devised an experiment with a program where it essentially functioned like a machine learning version of the Implicit Association Test. Called GloVe, and developed by Stanford University researchers, the popular, open-source program is of the sort that a startup machine learning company might use at the heart of its product. The GloVe algorithm can represent the co-occurrence statistics of words in, say, a 10-word window of text. Words that often appear near one another have a stronger association than those words that seldom do.

The Stanford researchers turned GloVe loose on a huge trawl of contents from the World Wide Web, containing 840 billion words. Within this large sample of written human culture, Narayanan and colleagues then examined sets of so-called target words, like “programmer, engineer, scientist” and “nurse, teacher, librarian” alongside two sets of attribute words, such as “man, male” and “woman, female,” looking for evidence of the kinds of biases humans can unwittingly possess.

In the results, innocent, inoffensive biases, like for flowers over bugs, showed up, but so did examples along lines of gender and race. As it turned out, the Princeton machine learning experiment managed to replicate the broad substantiations of bias found in select Implicit Association Test studies over the years that have relied on live, human subjects.

For instance, the machine learning program associated female names more with familial attribute words, like “parents” and “wedding,” than male names. In turn, male names had stronger associations with career attributes, like “professional” and “salary.” Of course, results such as these are often just objective reflections of the true, unequal distributions of occupation types with respect to gender–like how 77 percent of computer programmers are male, according to the U.S. Bureau of Labor Statistics.

Yet this correctly distinguished bias about occupations can end up having pernicious, sexist effects. An example: when foreign languages are naively processed by machine learning programs, leading to gender-stereotyped sentences. The Turkish language uses a gender-neutral, third person pronoun, “o.” Plugged into the well-known, online translation service Google Translate, however, the Turkish sentences “o bir doktor” and “o bir hem?ire” with this gender-neutral pronoun are translated into English as “he is a doctor” and “she is a nurse.”

“This paper reiterates the important point that machine learning methods are not ‘objective’ or ‘unbiased’ just because they rely on mathematics and algorithms,” said Hanna Wallach, a senior researcher at Microsoft Research New York City, who was not involved in the study. “Rather, as long as they are trained using data from society and as long as society exhibits biases, these methods will likely reproduce these biases.”

Another objectionable example harkens back to a well-known 2004 paper by Marianne Bertrand of the University of Chicago Booth School of Business and Sendhil Mullainathan of Harvard University. The economists sent out close to 5,000 identical resumes to 1,300 job advertisements, changing only the applicants’ names to be either traditionally European American or African American. The former group was 50 percent more likely to be offered an interview than the latter. In an apparent corroboration of this bias, the new Princeton study demonstrated that a set of African American names had more unpleasantness associations than a European American set.

Computer programmers might hope to prevent cultural stereotype perpetuation through the development of explicit, mathematics-based instructions for the machine learning programs underlying AI systems. Not unlike how parents and mentors try to instill concepts of fairness and equality in children and students, coders could endeavor to make machines reflect the better angels of human nature.

“The biases that we studied in the paper are easy to overlook when designers are creating systems,” said Narayanan. “The biases and stereotypes in our society reflected in our language are complex and longstanding. Rather than trying to sanitize or eliminate them, we should treat biases as part of the language and establish an explicit way in machine learning of determining what we consider acceptable and unacceptable.”

Here’s a link to and a citation for the Princeton paper,

Semantics derived automatically from language corpora contain human-like biases by Aylin Caliskan, Joanna J. Bryson, Arvind Narayanan. Science  14 Apr 2017: Vol. 356, Issue 6334, pp. 183-186 DOI: 10.1126/science.aal4230

This paper appears to be open access.

Links to more cautionary posts about AI,

Aug 5, 2009: Autonomous algorithms; intelligent windows; pretty nano pictures

June 14, 2016:  Accountability for artificial intelligence decision-making

Oct. 25, 2016 Removing gender-based stereotypes from algorithms

March 1, 2017: Algorithms in decision-making: a government inquiry in the UK

There’s also a book which makes some of the current use of AI programmes and big data quite accessible reading: Cathy O’Neil’s ‘Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy’.