Reading a virus like a book

Teaching grammar and syntax to artificial intelligence (AI) algorithms (specifically natural language processing (NLP) algorithms) has helped researchers understand and predict viral mutations more speedily. This facility is especially useful at a time when the Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus seems to be mutating into more easily transmissible variants.

Will Douglas Heaven’s Jan. 14, 2021 article for the Massachusetts Institute of Technology’s MIT Technology Review describes the work that links AI, grammar, and mutating viruses (Note: Links have been removed),

Galileo once observed that nature is written in math. Biology might be written in words. Natural-language processing (NLP) algorithms are now able to generate protein sequences and predict virus mutations, including key changes that help the coronavirus evade the immune system.

The key insight making this possible is that many properties of biological systems can be interpreted in terms of words and sentences. “We’re learning the language of evolution,” says Bonnie Berger, a computational biologist at the Massachusetts Institute of Technology [MIT].

In the last few years, a handful of researchers—including teams from geneticist George Church’s [Professor of Health Sciences and Technology at Harvard University and MIT, etc.] lab and Salesforce [emphasis mine]—have shown that protein sequences and genetic codes can be modeled using NLP techniques.

In a study published in Science today, Berger and her colleagues pull several of these strands together and use NLP to predict mutations that allow viruses to avoid being detected by antibodies in the human immune system, a process known as viral immune escape. The basic idea is that the interpretation of a virus by an immune system is analogous to the interpretation of a sentence by a human.

Berger’s team uses two different linguistic concepts: grammar and semantics (or meaning). The genetic or evolutionary fitness of a virus—characteristics such as how good it is at infecting a host—can be interpreted in terms of grammatical correctness. A successful, infectious virus is grammatically correct; an unsuccessful one is not.

Similarly, mutations of a virus can be interpreted in terms of semantics. Mutations that make a virus appear different to things in its environment—such as changes in its surface proteins that make it invisible to certain antibodies—have altered its meaning. Viruses with different mutations can have different meanings, and a virus with a different meaning may need different antibodies to read it.

Instead of millions of sentences, they trained the NLP model on thousands of genetic sequences taken from three different viruses: 45,000 unique sequences for a strain of influenza, 60,000 for a strain of HIV, and between 3,000 and 4,000 for a strain of Sars-Cov-2, the virus that causes covid-19. “There’s less data for the coronavirus because there’s been less surveillance,” says Brian Hie, a graduate student at MIT, who built the models.

The overall aim of the approach is to identify mutations that might let a virus escape an immune system without making it less infectious—that is, mutations that change a virus’s meaning without making it grammatically incorrect.

But it’s also just the beginning. Treating genetic mutations as changes in meaning could be applied in different ways across biology. “A good analogy can go a long way,” says Bryson [Bryan Bryson, a biologist at MIT].

If you have time, I recommend reading Heaven’s Jan. 14, 2021 article in its entirety as it’s well written with clear explanations. As for the article’s mentions of George Church and Salesforce, the former could be expected while the latter is not (by me, I speak for no one else).

I find it fascinating that a company which describes itself (from What is Salesforce?) as providing “… customer relationship management, or CRM. It gives all your departments — including marketing, sales, commerce, and service — a shared view of your customers … ” seems to be conducting investigations into one (or more?) areas of biology.

For those who’d like to dive into the science as described in Heaven’s article, here’s a link to and a citation for the paper,

Learning the language of viral evolution and escape by Brian Hie, Ellen D. Zhong, Bonnie Berger, Bryan Bryson. Science 15 Jan 2021: Vol. 371, Issue 6526, pp. 284-288 DOI: 10.1126/science.abd7331

This paper appears to be open access (or it is, at least for now).

There is also a preprint version available on bioRxiv, which is an open access repository.

Leave a Reply

Your email address will not be published. Required fields are marked *