Tag Archives: Jane Austen

Physicists go beyond semantics by taking tourist walks in complex networks to discover word meanings in context

It was a bit shocking to find out that physicists have made a breakthrough in semantics, a field of interest I associate with writers and linguistics experts, but there it was in a July 3, 2013 news item on the Springer (publisher) Select website,

Two Brazilian physicists have now devised a method to automatically elucidate the meaning of words with several senses, based solely on their patterns of connectivity with nearby words in a given sentence – and not on semantics. Thiago Silva and Diego Amancio from the University of São Paulo, Brazil, reveal, in a paper about to be published in EPJ B [European Physical Journal B]. how they modelled classics texts as complex networks in order to derive their meaning. This type of model plays a key role in several natural processing language tasks such as machine translation, information retrieval, content analysis and text processing.

Here more about the words the physicists used in their ‘tourist walk’ and the text they tested (from the news item),

In this study, the authors chose a set of ten so-called polysemous words—words with multiple meanings—such as bear, jam, just, rock or present. They then verified their patterns of connectivity with nearby words in the text of literary classics such as Jane Austen’s Pride and Prejudice. Specifically, they established a model that consisted of a set of nodes representing words connected by their “edges,” if they are adjacent in a text.The authors then compared the results of their disambiguation exercise with the traditional semantic-based approach. They observed significant accuracy rates in identifying the suitable meanings when using both techniques. The approach described in this study, based on a so-called deterministic tourist walk characterisation, can therefore be considered a complementary methodology for distinguishing between word senses.

Not have coming across the ‘tourist walk’ before, I went looking for a definition, which I found in a 2002 paper (Deterministic walks in random networks: an application to thesaurus graphs by O. Kinouchi, A. S. Martinez, G. F. Lima,  G. M. Lourenço, and S. Risau-Gusman),

In a landscape composed of N randomly distributed sites in Euclidean space, a walker (“tourist”) goes to the nearest one that has not been visited in the last τ steps. This procedure leads to trajectories composed of a transient part and a final cyclic attractor of period p. The tourist walk presents a simple scaling with respect to τ and can be performed in a wide range of networks that can be viewed as ordinal neighborhood graphs. As an example, we show that graphs defined by thesaurus dictionaries share some of the statistical properties of low dimensional (d= 2) Euclidean graphs and are easily distinguished from random link networks which correspond to the d→ ∞ limit. This approach furnishes complementary information to the usual clustering coefficient and mean minimum separation length.

This gives me only the vaguest sense of what they mean by tourist walk but it does give some idea of how these physicists approached a problem that is linguistic and semantic in nature.

Silva’s and Amancio’s paper in the European Physical Journal B is behind a paywall but there’s an earlier version of it freely available on arXiv.org,

Discriminating word senses with tourist walks in complex networks by Thiago C. Silva, Diego R. Amancio. (Submitted on 17 Jun 2013)  DOI:  10.1140/epjb/e2013-40025-4 Cite as:  arXiv:1306.3920 [cs.CL] or (or arXiv:1306.3920v1 [cs.CL] for this version)

I gather this work was done in English. I wonder why there’s no mention of the research being performed on texts in other languages either for this study or future studies. As you can see, the researchers concentrated on 19th century and early 20th century writers in the UK, from page 2 of the PDF available from arXiv.org,

Table 2.
List of books (and their respective authors) employed in the experiments aiming at discriminating the meaning of ambiguous words. The year of publication is speci ed after the title of the book.

Title Author
Pride and Prejudice (1813) J. Austen
American Notes (1842) C. Dickens
Coral Reefs (1842) C. Darwin
A Tale of Two Cities (1859) C. Dickens
The Moonstone (1868) W. Collins
Expression of Emotions (1872) C. Darwin
A Pair of Blue Eyes (1873) T. Hardy
Jude the Obscure (1895) T. Hardy
Dracula’s Guest (1897) B. Stoker
Uncle Bernac (1897) A. C. Doyle
The Tragedy of the Korosko (1898) A. C. Doyle
The Return of Sherlock Holmes (1903) A. C. Doyle
Tales of St. Austin’s (1903) P. G. Wodehouse
The Chronicles of Clovis (1911) H. H. Munro
A Changed Man (1913) T. Hardy
Beasts and Super Beasts (1914) H. H. Munro
The Wisdom of Father Brown (1914) G. K. Chesterton
My Man Jeeves (1919) P. G. Wodehouse