Tag Archives: medical AI systems

Being smart about using artificial intelligence in the field of medicine

Since my August 20, 2018 post featured an opinion piece about the possibly imminent replacement of radiologists with artificial intelligence systems and the latest research about employing them for diagnosing eye diseases, it seems like a good time to examine some of the mythology embedded in the discussion about AI and medicine.

Imperfections in medical AI systems

An August 15, 2018 article for Slate.com by W. Nicholson Price II (who teaches at the University of Michigan School of Law; in addition to his law degree he has a PhD in Biological Sciences from Columbia University) begins with the peppy, optimistic view before veering into more critical territory (Note: Links have been removed),

For millions of people suffering from diabetes, new technology enabled by artificial intelligence promises to make management much easier. Medtronic’s Guardian Connect system promises to alert users 10 to 60 minutes before they hit high or low blood sugar level thresholds, thanks to IBM Watson, “the same supercomputer technology that can predict global weather patterns.” Startup Beta Bionics goes even further: In May, it received Food and Drug Administration approval to start clinical trials on what it calls a “bionic pancreas system” powered by artificial intelligence, capable of “automatically and autonomously managing blood sugar levels 24/7.”

An artificial pancreas powered by artificial intelligence represents a huge step forward for the treatment of diabetes—but getting it right will be hard. Artificial intelligence (also known in various iterations as deep learning and machine learning) promises to automatically learn from patterns in medical data to help us do everything from managing diabetes to finding tumors in an MRI to predicting how long patients will live. But the artificial intelligence techniques involved are typically opaque. We often don’t know how the algorithm makes the eventual decision. And they may change and learn from new data—indeed, that’s a big part of the promise. But when the technology is complicated, opaque, changing, and absolutely vital to the health of a patient, how do we make sure it works as promised?

Price describes how a ‘closed loop’ artificial pancreas with AI would automate insulin levels for diabetic patients, flaws in the automated system, and how companies like to maintain a competitive advantage (Note: Links have been removed),

[…] a “closed loop” artificial pancreas, where software handles the whole issue, receiving and interpreting signals from the monitor, deciding when and how much insulin is needed, and directing the insulin pump to provide the right amount. The first closed-loop system was approved in late 2016. The system should take as much of the issue off the mind of the patient as possible (though, of course, that has limits). Running a close-loop artificial pancreas is challenging. The way people respond to changing levels of carbohydrates is complicated, as is their response to insulin; it’s hard to model accurately. Making it even more complicated, each individual’s body reacts a little differently.

Here’s where artificial intelligence comes into play. Rather than trying explicitly to figure out the exact model for how bodies react to insulin and to carbohydrates, machine learning methods, given a lot of data, can find patterns and make predictions. And existing continuous glucose monitors (and insulin pumps) are excellent at generating a lot of data. The idea is to train artificial intelligence algorithms on vast amounts of data from diabetic patients, and to use the resulting trained algorithms to run a closed-loop artificial pancreas. Even more exciting, because the system will keep measuring blood glucose, it can learn from the new data and each patient’s artificial pancreas can customize itself over time as it acquires new data from that patient’s particular reactions.

Here’s the tough question: How will we know how well the system works? Diabetes software doesn’t exactly have the best track record when it comes to accuracy. A 2015 study found that among smartphone apps for calculating insulin doses, two-thirds of the apps risked giving incorrect results, often substantially so. … And companies like to keep their algorithms proprietary for a competitive advantage, which makes it hard to know how they work and what flaws might have gone unnoticed in the development process.

There’s more,

These issues aren’t unique to diabetes care—other A.I. algorithms will also be complicated, opaque, and maybe kept secret by their developers. The potential for problems multiplies when an algorithm is learning from data from an entire hospital, or hospital system, or the collected data from an entire state or nation, not just a single patient. …

The [US Food and Drug Administraiont] FDA is working on this problem. The head of the agency has expressed his enthusiasm for bringing A.I. safely into medical practice, and the agency has a new Digital Health Innovation Action Plan to try to tackle some of these issues. But they’re not easy, and one thing making it harder is a general desire to keep the algorithmic sauce secret. The example of IBM Watson for Oncology has given the field a bit of a recent black eye—it turns out that the company knew the algorithm gave poor recommendations for cancer treatment but kept that secret for more than a year. …

While Price focuses on problems with algorithms and with developers and their business interests, he also hints at some of the body’s complexities.

Can AI systems be like people?

Susan Baxter, a medical writer with over 20 years experience, a PhD in health economics, and author of countless magazine articles and several books, offers a more person-centered approach to the discussion in her July 6, 2018 posting on susanbaxter.com,

The fascination with AI continues to irk, given that every second thing I read seems to be extolling the magic of AI and medicine and how It Will Change Everything. Which it will not, trust me. The essential issue of illness remains perennial and revolves around an individual for whom no amount of technology will solve anything without human contact. …

But in this world, or so we are told by AI proponents, radiologists will soon be obsolete. [my August 20, 2018 post] The adaptational learning capacities of AI mean that reading a scan or x-ray will soon be more ably done by machines than humans. The presupposition here is that we, the original programmers of this artificial intelligence, understand the vagaries of real life (and real disease) so wonderfully that we can deconstruct these much as we do the game of chess (where, let’s face it, Big Blue ate our lunch) and that analyzing a two-dimensional image of a three-dimensional body, already problematic, can be reduced to a series of algorithms.

Attempting to extrapolate what some “shadow” on a scan might mean in a flesh and blood human isn’t really quite the same as bishop to knight seven. Never mind the false positive/negatives that are considered an acceptable risk or the very real human misery they create.

Moravec called it

It’s called Moravec’s paradox, the inability of humans to realize just how complex basic physical tasks are – and the corresponding inability of AI to mimic it. As you walk across the room, carrying a glass of water, talking to your spouse/friend/cat/child; place the glass on the counter and open the dishwasher door with your foot as you open a jar of pickles at the same time, take a moment to consider just how many concurrent tasks you are doing and just how enormous the computational power these ostensibly simple moves would require.

Researchers in Singapore taught industrial robots to assemble an Ikea chair. Essentially, screw in the legs. A person could probably do this in a minute. Maybe two. The preprogrammed robots took nearly half an hour. And I suspect programming those robots took considerably longer than that.

Ironically, even Elon Musk, who has had major production problems with the Tesla cars rolling out of his high tech factory, has conceded (in a tweet) that “Humans are underrated.”

I wouldn’t necessarily go that far given the political shenanigans of Trump & Co. but in the grand scheme of things I tend to agree. …

Is AI going the way of gene therapy?

Susan draws a parallel between the AI and medicine discussion with the discussion about genetics and medicine (Note: Links have been removed),

On a somewhat similar note – given the extent to which genetics discourse has that same linear, mechanistic  tone [as AI and medicine] – it turns out all this fine talk of using genetics to determine health risk and whatnot is based on nothing more than clever marketing, since a lot of companies are making a lot of money off our belief in DNA. Truth is half the time we don’t even know what a gene is never mind what it actually does;  geneticists still can’t agree on how many genes there are in a human genome, as this article in Nature points out.

Along the same lines, I was most amused to read about something called the Super Seniors Study, research following a group of individuals in their 80’s, 90’s and 100’s who seem to be doing really well. Launched in 2002 and headed by Angela Brooks Wilson, a geneticist at the BC [British Columbia] Cancer Agency and SFU [Simon Fraser University] Chair of biomedical physiology and kinesiology, this longitudinal work is examining possible factors involved in healthy ageing.

Turns out genes had nothing to do with it, the title of the Globe and Mail article notwithstanding. (“Could the DNA of these super seniors hold the secret to healthy aging?” The answer, a resounding “no”, well hidden at the very [end], the part most people wouldn’t even get to.) All of these individuals who were racing about exercising and working part time and living the kind of life that makes one tired just reading about it all had the same “multiple (genetic) factors linked to a high probability of disease”. You know, the gene markers they tell us are “linked” to cancer, heart disease, etc., etc. But these super seniors had all those markers but none of the diseases, demonstrating (pretty strongly) that the so-called genetic links to disease are a load of bunkum. Which (she said modestly) I have been saying for more years than I care to remember. You’re welcome.

The fundamental error in this type of linear thinking is in allowing our metaphors (genes are the “blueprint” of life) and propensity towards social ideas of determinism to overtake common sense. Biological and physiological systems are not static; they respond to and change to life in its entirety, whether it’s diet and nutrition to toxic or traumatic insults. Immunity alters, endocrinology changes, – even how we think and feel affects the efficiency and effectiveness of physiology. Which explains why as we age we become increasingly dissimilar.

If you have the time, I encourage to read Susan’s comments in their entirety.

Scientific certainties

Following on with genetics, gene therapy dreams, and the complexity of biology, the June 19, 2018 Nature article by Cassandra Willyard (mentioned in Susan’s posting) highlights an aspect of scientific research not often mentioned in public,

One of the earliest attempts to estimate the number of genes in the human genome involved tipsy geneticists, a bar in Cold Spring Harbor, New York, and pure guesswork.

That was in 2000, when a draft human genome sequence was still in the works; geneticists were running a sweepstake on how many genes humans have, and wagers ranged from tens of thousands to hundreds of thousands. Almost two decades later, scientists armed with real data still can’t agree on the number — a knowledge gap that they say hampers efforts to spot disease-related mutations.

In 2000, with the genomics community abuzz over the question of how many human genes would be found, Ewan Birney launched the GeneSweep contest. Birney, now co-director of the European Bioinformatics Institute (EBI) in Hinxton, UK, took the first bets at a bar during an annual genetics meeting, and the contest eventually attracted more than 1,000 entries and a US$3,000 jackpot. Bets on the number of genes ranged from more than 312,000 to just under 26,000, with an average of around 40,000. These days, the span of estimates has shrunk — with most now between 19,000 and 22,000 — but there is still disagreement (See ‘Gene Tally’).

… the inconsistencies in the number of genes from database to database are problematic for researchers, Pruitt says. “People want one answer,” she [Kim Pruitt, a genome researcher at the US National Center for Biotechnology Information {NCB}] in Bethesda, Maryland] adds, “but biology is complex.”

I wanted to note that scientists do make guesses and not just with genetics. For example, Gina Mallet’s 2005 book ‘Last Chance to Eat: The Fate of Taste in a Fast Food World’ recounts the story of how good and bad levels of cholesterol were established—the experts made some guesses based on their experience. That said, Willyard’s article details the continuing effort to nail down the number of genes almost 20 years after the human genome project was completed and delves into the problems the scientists have uncovered.

Final comments

In addition to opaque processes with developers/entrepreneurs wanting to maintain their secrets for competitive advantages and in addition to our own poor understanding of the human body (how many genes are there anyway?), there are same major gaps (reflected in AI) in our understanding of various diseases. Angela Lashbrook’s August 16, 2018 article for The Atlantic highlights some issues with skin cancer and shade of your skin (Note: Links have been removed),

… While fair-skinned people are at the highest risk for contracting skin cancer, the mortality rate for African Americans is considerably higher: Their five-year survival rate is 73 percent, compared with 90 percent for white Americans, according to the American Academy of Dermatology.

As the rates of melanoma for all Americans continue a 30-year climb, dermatologists have begun exploring new technologies to try to reverse this deadly trend—including artificial intelligence. There’s been a growing hope in the field that using machine-learning algorithms to diagnose skin cancers and other skin issues could make for more efficient doctor visits and increased, reliable diagnoses. The earliest results are promising—but also potentially dangerous for darker-skinned patients.

… Avery Smith, … a software engineer in Baltimore, Maryland, co-authored a paper in JAMA [Journal of the American Medical Association] Dermatology that warns of the potential racial disparities that could come from relying on machine learning for skin-cancer screenings. Smith’s co-author, Adewole Adamson of the University of Texas at Austin, has conducted multiple studies on demographic imbalances in dermatology. “African Americans have the highest mortality rate [for skin cancer], and doctors aren’t trained on that particular skin type,” Smith told me over the phone. “When I came across the machine-learning software, one of the first things I thought was how it will perform on black people.”

Recently, a study that tested machine-learning software in dermatology, conducted by a group of researchers primarily out of Germany, found that “deep-learning convolutional neural networks,” or CNN, detected potentially cancerous skin lesions better than the 58 dermatologists included in the study group. The data used for the study come from the International Skin Imaging Collaboration, or ISIC, an open-source repository of skin images to be used by machine-learning algorithms. Given the rise in melanoma cases in the United States, a machine-learning algorithm that assists dermatologists in diagnosing skin cancer earlier could conceivably save thousands of lives each year.

… Chief among the prohibitive issues, according to Smith and Adamson, is that the data the CNN relies on come from primarily fair-skinned populations in the United States, Australia, and Europe. If the algorithm is basing most of its knowledge on how skin lesions appear on fair skin, then theoretically, lesions on patients of color are less likely to be diagnosed. “If you don’t teach the algorithm with a diverse set of images, then that algorithm won’t work out in the public that is diverse,” says Adamson. “So there’s risk, then, for people with skin of color to fall through the cracks.”

As Adamson and Smith’s paper points out, racial disparities in artificial intelligence and machine learning are not a new issue. Algorithms have mistaken images of black people for gorillas, misunderstood Asians to be blinking when they weren’t, and “judged” only white people to be attractive. An even more dangerous issue, according to the paper, is that decades of clinical research have focused primarily on people with light skin, leaving out marginalized communities whose symptoms may present differently.

The reasons for this exclusion are complex. According to Andrew Alexis, a dermatologist at Mount Sinai, in New York City, and the director of the Skin of Color Center, compounding factors include a lack of medical professionals from marginalized communities, inadequate information about those communities, and socioeconomic barriers to participating in research. “In the absence of a diverse study population that reflects that of the U.S. population, potential safety or efficacy considerations could be missed,” he says.

Adamson agrees, elaborating that with inadequate data, machine learning could misdiagnose people of color with nonexistent skin cancers—or miss them entirely. But he understands why the field of dermatology would surge ahead without demographically complete data. “Part of the problem is that people are in such a rush. This happens with any new tech, whether it’s a new drug or test. Folks see how it can be useful and they go full steam ahead without thinking of potential clinical consequences. …

Improving machine-learning algorithms is far from the only method to ensure that people with darker skin tones are protected against the sun and receive diagnoses earlier, when many cancers are more survivable. According to the Skin Cancer Foundation, 63 percent of African Americans don’t wear sunscreen; both they and many dermatologists are more likely to delay diagnosis and treatment because of the belief that dark skin is adequate protection from the sun’s harmful rays. And due to racial disparities in access to health care in America, African Americans are less likely to get treatment in time.

Happy endings

I’ll add one thing to Price’s article, Susan’s posting, and Lashbrook’s article about the issues with AI , certainty, gene therapy, and medicine—the desire for a happy ending prefaced with an easy solution. If the easy solution isn’t possible accommodations will be made but that happy ending is a must. All disease will disappear and there will be peace on earth. (Nod to Susan Baxter and her many discussions with me about disease processes and happy endings.)

The solutions, for the most part, are seen as technological despite the mountain of evidence suggesting that technology reflects our own imperfect understanding of health and disease therefore providing what is at best an imperfect solution.

Also, we tend to underestimate just how complex humans are not only in terms of disease and health but also with regard to our skills, understanding, and, perhaps not often enough, our ability to respond appropriately in the moment.

There is much to celebrate in what has been accomplished: no more black death, no more smallpox, hip replacements, pacemakers, organ transplants, and much more. Yes, we should try to improve our medicine. But, maybe alongside the celebration we can welcome AI and other technologies with a lot less hype and a lot more skepticism.