Tag Archives: Galileo

Punctuation: a universal complement to the mathematical perfection of language

Before getting to the research into mathematics and punctuation, I’m setting the scene with snippets from a February 13, 2023 online article by Dan Falk for Aperio magazine, which seems to function both as a magazine and an advertisement for postdoctoral work in Israel funded by the Azrieli Foundation,

Four centuries ago, Galileo famously described the physical world as a realm that was rooted in mathematics. The universe, he wrote, “cannot be read until we have learnt the language and become familiar with the characters in which it is written. It is written in mathematical language, and the letters are triangles, circles and other geometrical figures, without which means it is humanly impossible to comprehend a single word.”

Since Galileo’s time, scientists and philosophers have continued to ponder the question of why mathematics is so shockingly effective at describing physical phenomena. No one would deny that this is a deep question, but for philosopher Balthasar Grabmayr, an Azrieli International Postdoctoral Fellow at the University of Haifa, even deeper questions lie beneath it. Why does mathematics work at all? Does mathematics have limits? And if it does, what can we say about those limits?

Grabmayr found his way to this field from a very different passion: music. Growing up in Vienna, he attended a music conservatory and was set on becoming a classical musician. Eventually, he began to think about what made music work, and then began to think about musical structure. “I started to realize that, actually, what I’m interested in — what I found so attractive in music — is basically mathematics,” he recalls. “Mathematics is the science of structure. I was completely captured by that.”

One of Grabmayr’s main areas of research involves Gödel coding, a technique that, roughly put, allows mathematics to study itself. Gödel coding lets you convert statements about a system of rules or axioms into statements within the original system.

Gödel coding is named for the Austrian logician Kurt Gödel, who in the 1930s developed his famous “incompleteness theorems,” which point to the inherent limitations of mathematics. Although expressed as an equation, Gödel’s proof was based on the idea that a sentence such as “This statement is unprovable” is both true and unprovable. As Rebecca Goldstein’s biography of Gödel declares, he “demonstrated that in every formal system of arithmetic there are true statements that nevertheless cannot be proved. The result was an upheaval that spread far beyond mathematics, challenging conceptions of the nature of the mind.”

Grabmayr’s work builds on the program that Gödel began nearly a century ago. “What I’m really interested in is what the limitations of mathematics are,” he says. “What are the limits of what we can prove? What are the limits of what we can express in formal languages? And what are the limits of what we can calculate using computers?” (That last remark shows that Gödel coding is of interest well beyond the philosophy of mathematics. “We’re surrounded by it,” says Grabmayr. “I mean, without Gödel coding there wouldn’t be any computers.”)

Another potential application is in cognitive science and the study of the mind. Psychologists and other scientists have long debated to what extent the mind is, or is not, like a computer. When we “think,” are we manipulating symbols the way a computer does? The jury is still out on that question, but Grabmayr believes his work can at least point toward some answers. “Cognitive science is based on the premise that we can use computational models to capture certain phenomena of the brain,” he says. “Artificial intelligence, also, is very much concerned with trying to formally capture our reasoning, our thinking processes.”

Albert Visser, a philosopher and logician at Utrecht University in the Netherlands and one of Grabmayr’s PhD supervisors, sees a number of potential payoffs for this research. “Balthasar’s work has some overspill to computer science and linguistics, since it involves a systematic reflection both on coding and on the nature of syntax,” he says. “The discussion of ideas from computer science and linguistics in Balthasar’s work is also beneficial in the other direction. [emphases mine]

Now for the research into punctuation in European languages. From an April 19, 2023 Henryk Niewodniczanski Institute of Nuclear Physics Polish Academy of Sciences press release (also on EurekAlert but published April 20, 2023),

A moment’s hesitation… Yes, a full stop here – but shouldn’t there be a comma there? Or would a hyphen be better? Punctuation can be a nuisance; it is often simply neglected. Wrong! The most recent statistical analyses paint a different picture: punctuation seems to “grow out” of the foundations shared by all the (examined) languages, and its features are far from trivial.

To many, punctuation appears as a necessary evil, to be happily ignored whenever possible. Recent analyses of literature written in the world’s current major languages require us to alter this opinion. In fact, the same statistical features of punctuation usage patterns have been observed in several hundred works written in seven, mainly Western, languages. Punctuation, all ten representatives of which can be found in the introduction to this text, turns out to be a universal and indispensable complement to the mathematical perfection of every language studied. Such a remarkable conclusion about the role of mere commas, exclamation marks or full stops comes from an article by scientists from the Institute of Nuclear Physics of the Polish Academy of Sciences (IFJ PAN) in Cracow, published in the journal Chaos, Solitons & Fractals.

“The present analyses are an extension of our earlier results on the multifractal features of sentence length variation in works of world literature. After all, what is sentence length? It is nothing more than the distance to the next specific punctuation mark –  the full stop. So now we have taken all punctuation marks under a statistical magnifying glass, and we have also looked at what happens to punctuation during translation,” says Prof. Stanislaw Drozdz (IFJ PAN, Cracow University of Technology).

Two sets of texts were studied. The main analyses concerning punctuation within each language were carried out on 240 highly popular literary works written in seven major Western languages: English (44), German (34), French (32), Italian (32), Spanish (32), Polish (34) and Russian (32). This particular selection of languages was based on a criterion: the researchers assumed that no fewer than 50 million people should speak the language in question, and that the works written in it should have been awarded no fewer than five Nobel Prizes for Literature. In addition, for the statistical validity of the research results, each book had to contain at least 1,500 word sequences separated by punctuation marks. A separate collection was prepared to observe the stability of punctuation in translation. It contained 14 works, each of which was available in each of the languages studied (two of the 98 language versions, however, were omitted due to their unavailability). In total, authors in both collections included such writers as Conrad, Dickens, Doyle, Hemingway, Kipling, Orwell, Salinger, Woolf, Grass, Kafka, Mann, Nietzsche, Goethe, La Fayette, Dumas, Hugo, Proust, Verne, Eco, Cervantes, Sienkiewicz or Reymont.

The attention of the Cracow researchers was primarily drawn to the statistical distribution of the distance between consecutive punctuation marks. It soon became evident that in all the languages studied, it was best described by one of the precisely defined variants of the Weibull distribution. A curve of this type has a characteristic shape: it grows rapidly at first and then, after reaching a maximum value, descends somewhat more slowly to a certain critical value, below which it reaches zero with small and constantly decreasing dynamics. The Weibull distribution is usually used to describe survival phenomena (e.g. population as a function of age), but also various physical processes, such as increasing fatigue of materials.

“The concordance of the distribution of word sequence lengths between punctuation marks with the functional form of the Weibull distribution was better the more types of punctuation marks we included in the analyses; for all marks the concordance turned out to be almost complete. At the same time, some differences in the distributions are apparent between the different languages, but these merely amount to the selection of slightly different values for the distribution parameters, specific to the language in question. Punctuation thus seems to be an integral part of all the languages studied,” notes Prof. Drozdz, only to add after a moment with some amusement: “…and since the Weibull distribution is concerned with phenomena such as survival, it can be said with not too much tongue-in-cheek that punctuation has in its nature a literally embedded struggle for survival.”

The next stage of the analyses consisted of determining the hazard function. In the case of punctuation, it describes how the conditional probability of success – i.e. the probability of the next punctuation mark – changes if no such mark has yet appeared in the analysed sequence. The results here are clear: the language characterised by the lowest propensity to use punctuation is English, with Spanish not far behind; Slavic languages proved to be the most punctuation-dependent. The hazard function curves for punctuation marks in the six languages studied appeared to follow a similar pattern, they differed mainly in vertical shift.

German proved to be the exception. Its hazard function is the only one that intersects most of the curves constructed for the other languages. German punctuation thus seems to combine the punctuation features of many languages, making it a kind of Esperanto punctuation. The above observation dovetails with the next analysis, which was to see whether the punctuation features of original literary works can be seen in their translations. As expected, the language most faithfully transforming punctuation from the original language to the target language turned out to be German.

In spoken communication, pauses can be justified by human physiology, such as the need to catch one’s breath or to take a moment to structure what is to be said next in one’s mind. And in written communication?

“Creating a sentence by adding one word after another while ensuring that the message is clear and unambiguous is a bit like tightening the string of a bow: it is easy at first, but becomes more demanding with each passing moment. If there are no ordering elements in the text (and this is the role of punctuation), the difficulty of interpretation increases as the string of words lengthens. A bow that is too tight can break, and a sentence that is too long can become unintelligible. Therefore, the author is faced with the necessity of ‘freeing the arrow’, i.e. closing a passage of text with some sort of punctuation mark. This observation applies to all the languages analysed, so we are dealing with what could be called a linguistic law,” states Dr Tomasz Stanisz (IFJ PAN), first author of the article in question.

Finally, it is worth noting that the invention of punctuation is relatively recent – punctuation marks did not occur at all in old texts. The emergence of optimal punctuation patterns in modern written languages can therefore be interpreted as the result of their evolutionary advancement. However, the excessive need for punctuation is not necessarily a sign of such sophistication. English and Spanish, contemporarily the most universal languages, appear, in the light of the above studies, to be less strict about the frequency of punctuation use. It is likely that these languages are so formalised in terms of sentence construction that there is less room for ambiguity that would need to be resolved with punctuation marks.

The Henryk Niewodniczański Institute of Nuclear Physics (IFJ PAN) is currently one of the largest research institutes of the Polish Academy of Sciences. A wide range of research carried out at IFJ PAN covers basic and applied studies, from particle physics and astrophysics, through hadron physics, high-, medium-, and low-energy nuclear physics, condensed matter physics (including materials engineering), to various applications of nuclear physics in interdisciplinary research, covering medical physics, dosimetry, radiation and environmental biology, environmental protection, and other related disciplines. The average yearly publication output of IFJ PAN includes over 600 scientific papers in high-impact international journals. Each year the Institute hosts about 20 international and national scientific conferences. One of the most important facilities of the Institute is the Cyclotron Centre Bronowice (CCB), which is an infrastructure unique in Central Europe, serving as a clinical and research centre in the field of medical and nuclear physics. In addition, IFJ PAN runs four accredited research and measurement laboratories. IFJ PAN is a member of the Marian Smoluchowski Kraków Research Consortium: “Matter-Energy-Future”, which in the years 2012-2017 enjoyed the status of the Leading National Research Centre (KNOW) in physics. In 2017, the European Commission granted the Institute the HR Excellence in Research award. As a result of the categorization of the Ministry of Education and Science, the Institute has been classified into the A+ category (the highest scientific category in Poland) in the field of physical sciences.

Here’s a link to and a citation for the paper,

Universal versus system-specific features of punctuation usage patterns in major Western languages by Tomasz Stanisz, Stanisław Drożdż, and Jarosław Kwapień. Chaos, Solitons & Fractals Volume 168, March 2023, 113183 DOI: https://doi.org/10.1016/j.chaos.2023.113183

This paper is behind a paywall but the publishers do offer a preview of sorts.

There is also an earlier, less polished, open access version on the free peer review website arXiv,

Universal versus system-specific features of punctuation usage patterns in~major Western~languages by Tomasz Stanisz, Stanislaw Drozdz, Jaroslaw Kwapie. arXiv:2212.11182 [cs.CL] (or arXiv:2212.11182v1 [cs.CL] for this version) DOI: https://doi.org/10.48550/arXiv.2212.11182 Postede Wed, 21 Dec 2022 16:52:10 UTC (1,073 KB)

Reading a virus like a book

Teaching grammar and syntax to artificial intelligence (AI) algorithms (specifically natural language processing (NLP) algorithms) has helped researchers understand and predict viral mutations more speedily. This facility is especially useful at a time when the Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus seems to be mutating into more easily transmissible variants.

Will Douglas Heaven’s Jan. 14, 2021 article for the Massachusetts Institute of Technology’s MIT Technology Review describes the work that links AI, grammar, and mutating viruses (Note: Links have been removed),

Galileo once observed that nature is written in math. Biology might be written in words. Natural-language processing (NLP) algorithms are now able to generate protein sequences and predict virus mutations, including key changes that help the coronavirus evade the immune system.

The key insight making this possible is that many properties of biological systems can be interpreted in terms of words and sentences. “We’re learning the language of evolution,” says Bonnie Berger, a computational biologist at the Massachusetts Institute of Technology [MIT].

In the last few years, a handful of researchers—including teams from geneticist George Church’s [Professor of Health Sciences and Technology at Harvard University and MIT, etc.] lab and Salesforce [emphasis mine]—have shown that protein sequences and genetic codes can be modeled using NLP techniques.

In a study published in Science today, Berger and her colleagues pull several of these strands together and use NLP to predict mutations that allow viruses to avoid being detected by antibodies in the human immune system, a process known as viral immune escape. The basic idea is that the interpretation of a virus by an immune system is analogous to the interpretation of a sentence by a human.

Berger’s team uses two different linguistic concepts: grammar and semantics (or meaning). The genetic or evolutionary fitness of a virus—characteristics such as how good it is at infecting a host—can be interpreted in terms of grammatical correctness. A successful, infectious virus is grammatically correct; an unsuccessful one is not.

Similarly, mutations of a virus can be interpreted in terms of semantics. Mutations that make a virus appear different to things in its environment—such as changes in its surface proteins that make it invisible to certain antibodies—have altered its meaning. Viruses with different mutations can have different meanings, and a virus with a different meaning may need different antibodies to read it.

Instead of millions of sentences, they trained the NLP model on thousands of genetic sequences taken from three different viruses: 45,000 unique sequences for a strain of influenza, 60,000 for a strain of HIV, and between 3,000 and 4,000 for a strain of Sars-Cov-2, the virus that causes covid-19. “There’s less data for the coronavirus because there’s been less surveillance,” says Brian Hie, a graduate student at MIT, who built the models.

The overall aim of the approach is to identify mutations that might let a virus escape an immune system without making it less infectious—that is, mutations that change a virus’s meaning without making it grammatically incorrect.

But it’s also just the beginning. Treating genetic mutations as changes in meaning could be applied in different ways across biology. “A good analogy can go a long way,” says Bryson [Bryan Bryson, a biologist at MIT].

If you have time, I recommend reading Heaven’s Jan. 14, 2021 article in its entirety as it’s well written with clear explanations. As for the article’s mentions of George Church and Salesforce, the former could be expected while the latter is not (by me, I speak for no one else).

I find it fascinating that a company which describes itself (from What is Salesforce?) as providing “… customer relationship management, or CRM. It gives all your departments — including marketing, sales, commerce, and service — a shared view of your customers … ” seems to be conducting investigations into one (or more?) areas of biology.

For those who’d like to dive into the science as described in Heaven’s article, here’s a link to and a citation for the paper,

Learning the language of viral evolution and escape by Brian Hie, Ellen D. Zhong, Bonnie Berger, Bryan Bryson. Science 15 Jan 2021: Vol. 371, Issue 6526, pp. 284-288 DOI: 10.1126/science.abd7331

This paper appears to be open access (or it is, at least for now).

There is also a preprint version available on bioRxiv, which is an open access repository.

3D picture language for mathematics

There’s a new, 3D picture language for mathematics called ‘quon’ according to a March 3, 2017 news item on phys.org,

Galileo called mathematics the “language with which God wrote the universe.” He described a picture-language, and now that language has a new dimension.

The Harvard trio of Arthur Jaffe, the Landon T. Clay Professor of Mathematics and Theoretical Science, postdoctoral fellow Zhengwei Liu, and researcher Alex Wozniakowski has developed a 3-D picture-language for mathematics with potential as a tool across a range of topics, from pure math to physics.

Though not the first pictorial language of mathematics, the new one, called quon, holds promise for being able to transmit not only complex concepts, but also vast amounts of detail in relatively simple images. …

A March 2, 2017 Harvard University news release by Peter Reuell, which originated the news item, provides more context for the research,

“It’s a big deal,” said Jacob Biamonte of the Quantum Complexity Science Initiative after reading the research. “The paper will set a new foundation for a vast topic.”

“This paper is the result of work we’ve been doing for the past year and a half, and we regard this as the start of something new and exciting,” Jaffe said. “It seems to be the tip of an iceberg. We invented our language to solve a problem in quantum information, but we have already found that this language led us to the discovery of new mathematical results in other areas of mathematics. We expect that it will also have interesting applications in physics.”

When it comes to the “language” of mathematics, humans start with the basics — by learning their numbers. As we get older, however, things become more complex.

“We learn to use algebra, and we use letters to represent variables or other values that might be altered,” Liu said. “Now, when we look at research work, we see fewer numbers and more letters and formulas. One of our aims is to replace ‘symbol proof’ by ‘picture proof.’”

The new language relies on images to convey the same information that is found in traditional algebraic equations — and in some cases, even more.

“An image can contain information that is very hard to describe algebraically,” Liu said. “It is very easy to transmit meaning through an image, and easy for people to understand what they see in an image, so we visualize these concepts and instead of words or letters can communicate via pictures.”

“So this pictorial language for mathematics can give you insights and a way of thinking that you don’t see in the usual, algebraic way of approaching mathematics,” Jaffe said. “For centuries there has been a great deal of interaction between mathematics and physics because people were thinking about the same things, but from different points of view. When we put the two subjects together, we found many new insights, and this new language can take that into another dimension.”

In their most recent work, the researchers moved their language into a more literal realm, creating 3-D images that, when manipulated, can trigger mathematical insights.

“Where before we had been working in two dimensions, we now see that it’s valuable to have a language that’s Lego-like, and in three dimensions,” Jaffe said. “By pushing these pictures around, or working with them like an object you can deform, the images can have different mathematical meanings, and in that way we can create equations.”

Among their pictorial feats, Jaffe said, are the complex equations used to describe quantum teleportation. The researchers have pictures for the Pauli matrices, which are fundamental components of quantum information protocols. This shows that the standard protocols are topological, and also leads to discovery of new protocols.

“It turns out one picture is worth 1,000 symbols,” Jaffe said.

“We could describe this algebraically, and it might require an entire page of equations,” Liu added. “But we can do that in one picture, so it can capture a lot of information.”

Having found a fit with quantum information, the researchers are now exploring how their language might also be useful in a number of other subjects in mathematics and physics.

“We don’t want to make claims at this point,” Jaffe said, “but we believe and are thinking about quite a few other areas where this picture-language could be important.”

Sadly, there are no artistic images illustrating quon but this is from the paper,

An n-quon is represented by n hemispheres. We call the flat disc on the boundary of each hemisphere a boundary disc. Each hemisphere contains a neutral diagram with four boundary points on its boundary disk. The dotted box designates the internal structure that specifies the quon vector. For example, the 3-quon is represented as

Courtesy: PNAS and Harvard University

I gather the term ‘quon’ is meant to suggest quantum particles.

Here’s a link and a citation for the paper,

Quon 3D language for quantum information by Zhengwei Liu, Alex Wozniakowski, and Arthur M. Jaffe. Proceedins of the National Academy of Sciences Published online before print February 6, 2017, doi: 10.1073/pnas.1621345114 PNAS March 7, 2017 vol. 114 no. 10

This paper appears to be open access.

Science as a post-truth concept

The word of 2016, according to the Oxford Dictionary of English, is ‘post-truth’ and Steve Fuller, a professor from the University of Warwick (UK), has written an intriguing Dec. 15, 2016 essay  for the Guardian tracing the origins of post-truth as it relates to the sciences (Note: Links have been removed),

Even today, more than fifty years after its first edition, Thomas Kuhn’s The Structure of Scientific Revolutions remains the first port of call to learn about the history, philosophy or sociology of science. This is the book famous for talking about science as governed by ‘paradigms’ until overtaken by ‘revolutions’.

Kuhn argued that the way that both scientists and the general public need to understand the history of science is Orwellian. He is alluding to 1984, in which the protagonist’s job is to rewrite newspapers from the past to make it seem as though the government’s current policy is where it had been heading all along. In this perpetually airbrushed version of history, the public never sees the U-turns, switches of allegiance and errors of judgement that might cause them to question the state’s progressive narrative. Confidence in the status quo is maintained and new recruits are inspired to follow in its lead. Kuhn claimed that what applies to totalitarian 1984 also applies to science united under the spell of a paradigm.
ADVERTISING

What makes Kuhn’s account of science ‘post-truth’ is that truth is no longer the arbiter of legitimate power but rather the mask of legitimacy that is worn by everyone in pursuit of power. Truth is just one more – albeit perhaps the most important – resource in a power game without end. In this respect, science differs from politics only in that the masks of its players rarely drop.

The explanation for what happens behind the masks lies in the work of the Italian political economist Vilfredo Pareto (1848-1923), devotee of Machiavelli, admired by Mussolini and one of sociology’s forgotten founders. Kuhn spent his formative years at Harvard in the late 1930s when the local kingmaker, biochemist Lawrence Henderson, not only taught the first history of science courses but also convened an interdisciplinary ‘Pareto Circle’ to get the university’s rising stars acquainted with the person he regarded as Marx’s only true rival.

For Pareto, what passes for social order is the result of the interplay of two sorts of elites, which he called, following Machiavelli, ‘lions’ and ‘foxes’. The lions acquire legitimacy from tradition, which in science is based on expertise rather than lineage or custom. Yet, like these earlier forms of legitimacy, expertise derives its authority from the cumulative weight of intergenerational experience. This is exactly what Kuhn meant by a ‘paradigm’ in science – a set of conventions by which knowledge builds in an orderly fashion to complete a certain world-view established by a founding figure – say, Newton or Darwin. Each new piece of knowledge is anointed by a process of ‘peer review’.
Advertisement

As in 1984, the lions normally dictate the historical narrative. But on the cutting room floor lies the activities of the other set of elites, the foxes. In today’s politics of science, they are known by a variety of names, ranging from ‘mavericks’ to ‘social constructivists’ to ‘pseudoscientists’. Foxes are characterised by dissent and unrest, thriving in a world of openness and opportunity.

Foxes stress the present as an ecstatic moment in which there is everything to play for. This includes a decisive break with ‘the past’, which they know has been fictionalized anyway, as in 1984. Self-styled visionaries present themselves, like Galileo, as the first to see what is in plain sight. Expertise appears as a repository of corrupt judgement designed to suppress promising alternatives to already bankrupt positions. For Kuhn, the scientific foxes get the upper hand whenever cracks appear in the lions’ smooth narrative, the persistent ‘anomalies’ that can’t be explained by the ruling paradigm.

But the foxes have their own Achilles Heel: They are strong in opposition but divisively self-critical in office. …

I encourage you to read the essay in its entirety although I don’t necessarily subscribe to the some of the statements. For example, I wouldn’t lump ‘mavericks’, ‘social constructivists’, and ‘pseudoscientists’ together without some discussion about ‘pseudoscience’. It’s true that an accusation of ‘pseudoscience’ is often leveled at people who are challenging the status quo but there are also situations where people use science as a mask to legitimate some fairly hinky work.