Tag Archives: semantics

Reading a virus like a book

Teaching grammar and syntax to artificial intelligence (AI) algorithms (specifically natural language processing (NLP) algorithms) has helped researchers understand and predict viral mutations more speedily. This facility is especially useful at a time when the Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus seems to be mutating into more easily transmissible variants.

Will Douglas Heaven’s Jan. 14, 2021 article for the Massachusetts Institute of Technology’s MIT Technology Review describes the work that links AI, grammar, and mutating viruses (Note: Links have been removed),

Galileo once observed that nature is written in math. Biology might be written in words. Natural-language processing (NLP) algorithms are now able to generate protein sequences and predict virus mutations, including key changes that help the coronavirus evade the immune system.

The key insight making this possible is that many properties of biological systems can be interpreted in terms of words and sentences. “We’re learning the language of evolution,” says Bonnie Berger, a computational biologist at the Massachusetts Institute of Technology [MIT].

In the last few years, a handful of researchers—including teams from geneticist George Church’s [Professor of Health Sciences and Technology at Harvard University and MIT, etc.] lab and Salesforce [emphasis mine]—have shown that protein sequences and genetic codes can be modeled using NLP techniques.

In a study published in Science today, Berger and her colleagues pull several of these strands together and use NLP to predict mutations that allow viruses to avoid being detected by antibodies in the human immune system, a process known as viral immune escape. The basic idea is that the interpretation of a virus by an immune system is analogous to the interpretation of a sentence by a human.

Berger’s team uses two different linguistic concepts: grammar and semantics (or meaning). The genetic or evolutionary fitness of a virus—characteristics such as how good it is at infecting a host—can be interpreted in terms of grammatical correctness. A successful, infectious virus is grammatically correct; an unsuccessful one is not.

Similarly, mutations of a virus can be interpreted in terms of semantics. Mutations that make a virus appear different to things in its environment—such as changes in its surface proteins that make it invisible to certain antibodies—have altered its meaning. Viruses with different mutations can have different meanings, and a virus with a different meaning may need different antibodies to read it.

Instead of millions of sentences, they trained the NLP model on thousands of genetic sequences taken from three different viruses: 45,000 unique sequences for a strain of influenza, 60,000 for a strain of HIV, and between 3,000 and 4,000 for a strain of Sars-Cov-2, the virus that causes covid-19. “There’s less data for the coronavirus because there’s been less surveillance,” says Brian Hie, a graduate student at MIT, who built the models.

The overall aim of the approach is to identify mutations that might let a virus escape an immune system without making it less infectious—that is, mutations that change a virus’s meaning without making it grammatically incorrect.

But it’s also just the beginning. Treating genetic mutations as changes in meaning could be applied in different ways across biology. “A good analogy can go a long way,” says Bryson [Bryan Bryson, a biologist at MIT].

If you have time, I recommend reading Heaven’s Jan. 14, 2021 article in its entirety as it’s well written with clear explanations. As for the article’s mentions of George Church and Salesforce, the former could be expected while the latter is not (by me, I speak for no one else).

I find it fascinating that a company which describes itself (from What is Salesforce?) as providing “… customer relationship management, or CRM. It gives all your departments — including marketing, sales, commerce, and service — a shared view of your customers … ” seems to be conducting investigations into one (or more?) areas of biology.

For those who’d like to dive into the science as described in Heaven’s article, here’s a link to and a citation for the paper,

Learning the language of viral evolution and escape by Brian Hie, Ellen D. Zhong, Bonnie Berger, Bryan Bryson. Science 15 Jan 2021: Vol. 371, Issue 6526, pp. 284-288 DOI: 10.1126/science.abd7331

This paper appears to be open access (or it is, at least for now).

There is also a preprint version available on bioRxiv, which is an open access repository.

SEMANTICS, a major graphene project based in Ireland

A Jan. 28, 2015 news item on Nanowerk profiles SEMANTICS, a major graphene project based in Ireland (Note: A link has been removed),

Graphene is the strongest, most impermeable and conductive material known to man. Graphene sheets are just one atom thick, but 200 times stronger than steel. The European Union is investing heavily in the exploitation of graphene’s unique properties through a number of research initiatives such as the SEMANTICS project running at Trinity College Dublin.

A Dec. 16, 2014 European Commission press release, which originated the news item, provides an overview of the graphene enterprise in Europe,

It is no surprise that graphene, a substance with better electrical and thermal conductivity, mechanical strength and optical purity than any other, is being heralded as the ‘wonder material’ of the 21stcentury, as plastics were in the 20thcentury.

Graphene could be used to create ultra-fast electronic transistors, foldable computer displays and light-emitting diodes. It could increase and improve the efficiency of batteries and solar cells, help strengthen aircraft wings and even revolutionise tissue engineering and drug delivery in the health sector.

It is this huge potential which has convinced the European Commission to commit €1 billion to the Future and Emerging Technologies (FET) Graphene Flagship project, the largest-ever research initiative funded in the history of the EU. It has a guaranteed €54 million in funding for the first two years with much more expected over the next decade.

Sustained funding for the full duration of the Graphene Flagship project comes from the EU’s Research Framework Programmes, principally from Horizon 2020 (2014-2020).

The aim of the Graphene Flagship project, likened in scale to NASA’s mission to put a man on the moon in the 1960s, or the Human Genome project in the 1990s, is to take graphene and related two-dimensional materials such as silicene (a single layer of silicon atoms) from a state of raw potential to a point where they can revolutionise multiple industries and create economic growth and new jobs in Europe.

The research effort will cover the entire value chain, from materials production to components and system integration. It will help to develop the strong position Europe already has in the field and provide an opportunity for European initiatives to lead in global efforts to fully exploit graphene’s miraculous properties.

Under the EU plan, 126 academics and industry groups from 17 countries will work on 15 individual but connected projects.

The press release then goes on to describe a new project, SEMANTICS,

… this is not the only support being provided by the EU for research into the phenomenal potential of graphene. The SEMANTICS research project, led by Professor Jonathan Coleman at Trinity College Dublin, is funded by the European Research Council (ERC) and has already achieved some promising results.

The ERC does not assign funding to particular challenges or objectives, but selects the best scientists with the best ideas on the sole criterion of excellence. By providing complementary types of funding, both to individual scientists to work on their own ideas, and to large-scale consortia to coordinate top-down programmes, the EU is helping to progress towards a better knowledge and exploitation of graphene.

“It is no overestimation to state that graphene is one of the most exciting materials of our lifetime,” Prof. Coleman says. “It has the potential to provide answers to the questions that have so far eluded us. Technology, energy and aviation companies worldwide are racing to discover the full potential of graphene. Our research will be an important element in helping to realise that potential.”

With the help of European Research Council (ERC) Starting and Proof of Concept Grants, Prof. Coleman and his team are researching methods for obtaining single-atom layers of graphene and other layered compounds through exfoliation (peeling off) from the multilayers, followed by deposition on a range of surfaces to prepare films displaying specific behaviour.

“We’re working towards making graphene and other single-atom layers available on an economically viable industrial scale, and making it cheaply,” Prof. Coleman continues.

“At CRANN [Centre for Research on Adaptive Nanostructures and Nanodevices at Trinity College Dublin], we are developing nanosheets of graphene and other single-atom materials which can be made in very large quantities,” he adds. “When you put these sheets in plastic, for example, you make the plastic stronger. Not only that – you can massively increase its electrical properties, you can improve its thermal properties and you can make it less permeable to gases. The applications for industry could be endless.”

Prof. Coleman admits that scientists are regularly taken aback by the potential of graphene. “We are continually amazed at what graphene and other single-atom layers can do,” he reveals. “Recently it has been discovered that, when added to glue, graphene can make it more adhesive. Who would have thought that? It’s becoming clear that graphene just makes things a whole lot better,” he concludes.

So far, the project has developed a practical method for producing two-dimensional nanosheets in large quantities. Crucially, these nanosheets are already being used for a range of applications, including the production of reinforced plastics and metals, building super-capacitors and batteries which store energy, making cheap light detectors, and enabling ultra-sensitive position and motion sensors. As the number of application grows, increased demand for these materials is anticipated. In response, the SEMANTICS team has scaled up the production process and is now producing 2D nanosheets at a rate more than 1000 times faster than was possible just a year ago.

I believe that new graphene production process is the ‘blender’ technique featured here in an April 23, 2014 post. There’s also a profile of the ‘blender’ project  in a Dec. 10, 2014 article by Ben Deighton for the European Commission’s Horizon magazine (Horizon 2020 is the European Union’s framework science funding programme). Deighton’s article hosts a video of Jonathan Coleman speaking about nanotechnology, blenders, and more on Dec. 1, 2014 at TEDxBrussels.

Physicists go beyond semantics by taking tourist walks in complex networks to discover word meanings in context

It was a bit shocking to find out that physicists have made a breakthrough in semantics, a field of interest I associate with writers and linguistics experts, but there it was in a July 3, 2013 news item on the Springer (publisher) Select website,

Two Brazilian physicists have now devised a method to automatically elucidate the meaning of words with several senses, based solely on their patterns of connectivity with nearby words in a given sentence – and not on semantics. Thiago Silva and Diego Amancio from the University of São Paulo, Brazil, reveal, in a paper about to be published in EPJ B [European Physical Journal B]. how they modelled classics texts as complex networks in order to derive their meaning. This type of model plays a key role in several natural processing language tasks such as machine translation, information retrieval, content analysis and text processing.

Here more about the words the physicists used in their ‘tourist walk’ and the text they tested (from the news item),

In this study, the authors chose a set of ten so-called polysemous words—words with multiple meanings—such as bear, jam, just, rock or present. They then verified their patterns of connectivity with nearby words in the text of literary classics such as Jane Austen’s Pride and Prejudice. Specifically, they established a model that consisted of a set of nodes representing words connected by their “edges,” if they are adjacent in a text.The authors then compared the results of their disambiguation exercise with the traditional semantic-based approach. They observed significant accuracy rates in identifying the suitable meanings when using both techniques. The approach described in this study, based on a so-called deterministic tourist walk characterisation, can therefore be considered a complementary methodology for distinguishing between word senses.

Not have coming across the ‘tourist walk’ before, I went looking for a definition, which I found in a 2002 paper (Deterministic walks in random networks: an application to thesaurus graphs by O. Kinouchi, A. S. Martinez, G. F. Lima,  G. M. Lourenço, and S. Risau-Gusman),

In a landscape composed of N randomly distributed sites in Euclidean space, a walker (“tourist”) goes to the nearest one that has not been visited in the last τ steps. This procedure leads to trajectories composed of a transient part and a final cyclic attractor of period p. The tourist walk presents a simple scaling with respect to τ and can be performed in a wide range of networks that can be viewed as ordinal neighborhood graphs. As an example, we show that graphs defined by thesaurus dictionaries share some of the statistical properties of low dimensional (d= 2) Euclidean graphs and are easily distinguished from random link networks which correspond to the d→ ∞ limit. This approach furnishes complementary information to the usual clustering coefficient and mean minimum separation length.

This gives me only the vaguest sense of what they mean by tourist walk but it does give some idea of how these physicists approached a problem that is linguistic and semantic in nature.

Silva’s and Amancio’s paper in the European Physical Journal B is behind a paywall but there’s an earlier version of it freely available on arXiv.org,

Discriminating word senses with tourist walks in complex networks by Thiago C. Silva, Diego R. Amancio. (Submitted on 17 Jun 2013)  DOI:  10.1140/epjb/e2013-40025-4 Cite as:  arXiv:1306.3920 [cs.CL] or (or arXiv:1306.3920v1 [cs.CL] for this version)

I gather this work was done in English. I wonder why there’s no mention of the research being performed on texts in other languages either for this study or future studies. As you can see, the researchers concentrated on 19th century and early 20th century writers in the UK, from page 2 of the PDF available from arXiv.org,

Table 2.
List of books (and their respective authors) employed in the experiments aiming at discriminating the meaning of ambiguous words. The year of publication is speci ed after the title of the book.

Title Author
Pride and Prejudice (1813) J. Austen
American Notes (1842) C. Dickens
Coral Reefs (1842) C. Darwin
A Tale of Two Cities (1859) C. Dickens
The Moonstone (1868) W. Collins
Expression of Emotions (1872) C. Darwin
A Pair of Blue Eyes (1873) T. Hardy
Jude the Obscure (1895) T. Hardy
Dracula’s Guest (1897) B. Stoker
Uncle Bernac (1897) A. C. Doyle
The Tragedy of the Korosko (1898) A. C. Doyle
The Return of Sherlock Holmes (1903) A. C. Doyle
Tales of St. Austin’s (1903) P. G. Wodehouse
The Chronicles of Clovis (1911) H. H. Munro
A Changed Man (1913) T. Hardy
Beasts and Super Beasts (1914) H. H. Munro
The Wisdom of Father Brown (1914) G. K. Chesterton
My Man Jeeves (1919) P. G. Wodehouse