Tag Archives: Frank Rosenblatt

An explanation of neural networks from the Massachusetts Institute of Technology (MIT)

I always enjoy the MIT ‘explainers’ and have been a little sad that I haven’t stumbled across one in a while. Until now, that is. Here’s an April 14, 201 neural network ‘explainer’ (in its entirety) by Larry Hardesty (?),

In the past 10 years, the best-performing artificial-intelligence systems — such as the speech recognizers on smartphones or Google’s latest automatic translator — have resulted from a technique called “deep learning.”

Deep learning is in fact a new name for an approach to artificial intelligence called neural networks, which have been going in and out of fashion for more than 70 years. Neural networks were first proposed in 1944 by Warren McCullough and Walter Pitts, two University of Chicago researchers who moved to MIT in 1952 as founding members of what’s sometimes called the first cognitive science department.

Neural nets were a major area of research in both neuroscience and computer science until 1969, when, according to computer science lore, they were killed off by the MIT mathematicians Marvin Minsky and Seymour Papert, who a year later would become co-directors of the new MIT Artificial Intelligence Laboratory.

The technique then enjoyed a resurgence in the 1980s, fell into eclipse again in the first decade of the new century, and has returned like gangbusters in the second, fueled largely by the increased processing power of graphics chips.

“There’s this idea that ideas in science are a bit like epidemics of viruses,” says Tomaso Poggio, the Eugene McDermott Professor of Brain and Cognitive Sciences at MIT, an investigator at MIT’s McGovern Institute for Brain Research, and director of MIT’s Center for Brains, Minds, and Machines. “There are apparently five or six basic strains of flu viruses, and apparently each one comes back with a period of around 25 years. People get infected, and they develop an immune response, and so they don’t get infected for the next 25 years. And then there is a new generation that is ready to be infected by the same strain of virus. In science, people fall in love with an idea, get excited about it, hammer it to death, and then get immunized — they get tired of it. So ideas should have the same kind of periodicity!”

Weighty matters

Neural nets are a means of doing machine learning, in which a computer learns to perform some task by analyzing training examples. Usually, the examples have been hand-labeled in advance. An object recognition system, for instance, might be fed thousands of labeled images of cars, houses, coffee cups, and so on, and it would find visual patterns in the images that consistently correlate with particular labels.

Modeled loosely on the human brain, a neural net consists of thousands or even millions of simple processing nodes that are densely interconnected. Most of today’s neural nets are organized into layers of nodes, and they’re “feed-forward,” meaning that data moves through them in only one direction. An individual node might be connected to several nodes in the layer beneath it, from which it receives data, and several nodes in the layer above it, to which it sends data.

To each of its incoming connections, a node will assign a number known as a “weight.” When the network is active, the node receives a different data item — a different number — over each of its connections and multiplies it by the associated weight. It then adds the resulting products together, yielding a single number. If that number is below a threshold value, the node passes no data to the next layer. If the number exceeds the threshold value, the node “fires,” which in today’s neural nets generally means sending the number — the sum of the weighted inputs — along all its outgoing connections.

When a neural net is being trained, all of its weights and thresholds are initially set to random values. Training data is fed to the bottom layer — the input layer — and it passes through the succeeding layers, getting multiplied and added together in complex ways, until it finally arrives, radically transformed, at the output layer. During training, the weights and thresholds are continually adjusted until training data with the same labels consistently yield similar outputs.

Minds and machines

The neural nets described by McCullough and Pitts in 1944 had thresholds and weights, but they weren’t arranged into layers, and the researchers didn’t specify any training mechanism. What McCullough and Pitts showed was that a neural net could, in principle, compute any function that a digital computer could. The result was more neuroscience than computer science: The point was to suggest that the human brain could be thought of as a computing device.

Neural nets continue to be a valuable tool for neuroscientific research. For instance, particular network layouts or rules for adjusting weights and thresholds have reproduced observed features of human neuroanatomy and cognition, an indication that they capture something about how the brain processes information.

The first trainable neural network, the Perceptron, was demonstrated by the Cornell University psychologist Frank Rosenblatt in 1957. The Perceptron’s design was much like that of the modern neural net, except that it had only one layer with adjustable weights and thresholds, sandwiched between input and output layers.

Perceptrons were an active area of research in both psychology and the fledgling discipline of computer science until 1959, when Minsky and Papert published a book titled “Perceptrons,” which demonstrated that executing certain fairly common computations on Perceptrons would be impractically time consuming.

“Of course, all of these limitations kind of disappear if you take machinery that is a little more complicated — like, two layers,” Poggio says. But at the time, the book had a chilling effect on neural-net research.

“You have to put these things in historical context,” Poggio says. “They were arguing for programming — for languages like Lisp. Not many years before, people were still using analog computers. It was not clear at all at the time that programming was the way to go. I think they went a little bit overboard, but as usual, it’s not black and white. If you think of this as this competition between analog computing and digital computing, they fought for what at the time was the right thing.”

Periodicity

By the 1980s, however, researchers had developed algorithms for modifying neural nets’ weights and thresholds that were efficient enough for networks with more than one layer, removing many of the limitations identified by Minsky and Papert. The field enjoyed a renaissance.

But intellectually, there’s something unsatisfying about neural nets. Enough training may revise a network’s settings to the point that it can usefully classify data, but what do those settings mean? What image features is an object recognizer looking at, and how does it piece them together into the distinctive visual signatures of cars, houses, and coffee cups? Looking at the weights of individual connections won’t answer that question.

In recent years, computer scientists have begun to come up with ingenious methods for deducing the analytic strategies adopted by neural nets. But in the 1980s, the networks’ strategies were indecipherable. So around the turn of the century, neural networks were supplanted by support vector machines, an alternative approach to machine learning that’s based on some very clean and elegant mathematics.

The recent resurgence in neural networks — the deep-learning revolution — comes courtesy of the computer-game industry. The complex imagery and rapid pace of today’s video games require hardware that can keep up, and the result has been the graphics processing unit (GPU), which packs thousands of relatively simple processing cores on a single chip. It didn’t take long for researchers to realize that the architecture of a GPU is remarkably like that of a neural net.

Modern GPUs enabled the one-layer networks of the 1960s and the two- to three-layer networks of the 1980s to blossom into the 10-, 15-, even 50-layer networks of today. That’s what the “deep” in “deep learning” refers to — the depth of the network’s layers. And currently, deep learning is responsible for the best-performing systems in almost every area of artificial-intelligence research.

Under the hood

The networks’ opacity is still unsettling to theorists, but there’s headway on that front, too. In addition to directing the Center for Brains, Minds, and Machines (CBMM), Poggio leads the center’s research program in Theoretical Frameworks for Intelligence. Recently, Poggio and his CBMM colleagues have released a three-part theoretical study of neural networks.

The first part, which was published last month in the International Journal of Automation and Computing, addresses the range of computations that deep-learning networks can execute and when deep networks offer advantages over shallower ones. Parts two and three, which have been released as CBMM technical reports, address the problems of global optimization, or guaranteeing that a network has found the settings that best accord with its training data, and overfitting, or cases in which the network becomes so attuned to the specifics of its training data that it fails to generalize to other instances of the same categories.

There are still plenty of theoretical questions to be answered, but CBMM researchers’ work could help ensure that neural networks finally break the generational cycle that has brought them in and out of favor for seven decades.

This image from MIT illustrates a ‘modern’ neural network,

Most applications of deep learning use “convolutional” neural networks, in which the nodes of each layer are clustered, the clusters overlap, and each cluster feeds data to multiple nodes (orange and green) of the next layer. Image: Jose-Luis Olivares/MIT

h/t phys.org April 17, 2017

One final note, I wish the folks at MIT had an ‘explainer’ archive. I’m not sure how to find any more ‘explainers on MIT’s website.

Deep learning and some history from the Swiss National Science Foundation (SNSF)

A June 27, 2016 news item on phys.org provides a measured analysis of deep learning and its current state of development (from a Swiss perspective),

In March 2016, the world Go champion Lee Sedol lost 1-4 against the artificial intelligence AlphaGo. For many, this was yet another defeat for humanity at the hands of the machines. Indeed, the success of the AlphaGo software was forged in an area of artificial intelligence that has seen huge progress over the last decade. Deep learning, as it’s called, uses artificial neural networks to process algorithmic calculations. This software architecture therefore mimics biological neural networks.

Much of the progress in deep learning is thanks to the work of Jürgen Schmidhuber, director of the IDSIA (Istituto Dalle Molle di Studi sull’Intelligenza Artificiale) which is located in the suburbs of Lugano. The IDSIA doctoral student Shane Legg and a group of former colleagues went on to found DeepMind, the startup acquired by Google in early 2014 for USD 500 million. The DeepMind algorithms eventually wound up in AlphaGo.

“Schmidhuber is one of the best at deep learning,” says Boi Faltings of the EPFL Artificial Intelligence Lab. “He never let go of the need to keep working at it.” According to Stéphane Marchand-Maillet of the University of Geneva computing department, “he’s been in the race since the very beginning.”

A June 27, 2016 SNSF news release (first published as a story in Horizons no. 109 June 2016) by Fabien Goubet, which originated the news item, goes on to provide a brief history,

The real strength of deep learning is structural recognition, and winning at Go is just an illustration of this, albeit a rather resounding one. Elsewhere, and for some years now, we have seen it applied to an entire spectrum of areas, such as visual and vocal recognition, online translation tools and smartphone personal assistants. One underlying principle of machine learning is that algorithms must first be trained using copious examples. Naturally, this has been helped by the deluge of user-generated content spawned by smartphones and web 2.0, stretching from Facebook photo comments to official translations published on the Internet. By feeding a machine thousands of accurately tagged images of cats, for example, it learns first to recognise those cats and later any image of a cat, including those it hasn’t been fed.

Deep learning isn’t new; it just needed modern computers to come of age. As far back as the early 1950s, biologists tried to lay out formal principles to explain the working of the brain’s cells. In 1956, the psychologist Frank Rosenblatt of the New York State Aeronautical Laboratory published a numerical model based on these concepts, thereby creating the very first artificial neural network. Once integrated into a calculator, it learned to recognise rudimentary images.

“This network only contained eight neurones organised in a single layer. It could only recognise simple characters”, says Claude Touzet of the Adaptive and Integrative Neuroscience Laboratory of Aix-Marseille University. “It wasn’t until 1985 that we saw the second generation of artificial neural networks featuring multiple layers and much greater performance”. This breakthrough was made simultaneously by three researchers: Yann LeCun in Paris, Geoffrey Hinton in Toronto and Terrence Sejnowski in Baltimore.

Byte-size learning

In multilayer networks, each layer learns to recognise the precise visual characteristics of a shape. The deeper the layer, the more abstract the characteristics. With cat photos, the first layer analyses pixel colour, and the following layer recognises the general form of the cat. This structural design can support calculations being made upon thousands of layers, and it was this aspect of the architecture that gave rise to the name ‘deep learning’.

Marchand-Maillet explains: “Each artificial neurone is assigned an input value, which it computes using a mathematical function, only firing if the output exceeds a pre-defined threshold”. In this way, it reproduces the behaviour of real neurones, which only fire and transmit information when the input signal (the potential difference across the entire neural circuit) reaches a certain level. In the artificial model, the results of a single layer are weighted, added up and then sent as the input signal to the following layer, which processes that input using different functions, and so on and so forth.

For example, if a system is trained with great quantities of photos of apples and watermelons, it will progressively learn to distinguish them on the basis of diameter, says Marchand-Maillet. If it cannot decide (e.g., when processing a picture of a tiny watermelon), the subsequent layers take over by analysing the colours or textures of the fruit in the photo, and so on. In this way, every step in the process further refines the assessment.

Video games to the rescue

For decades, the frontier of computing held back more complex applications, even at the cutting edge. Industry walked away, and deep learning only survived thanks to the video games sector, which eventually began producing graphics chips, or GPUs, with an unprecedented power at accessible prices: up to 6 teraflops (i.e., 6 trillion calculations per second) for a few hundred dollars. “There’s no doubt that it was this calculating power that laid the ground for the quantum leap in deep learning”, says Touzet. GPUs are also very good at parallel calculations, a useful function for executing the innumerable simultaneous operations required by neural networks.
Although image analysis is getting great results, things are more complicated for sequential data objects such as natural spoken language and video footage. This has formed part of Schmidhuber’s work since 1989, and his response has been to develop recurrent neural networks in which neurones communicate with each other in loops, feeding processed data back into the initial layers.

Such sequential data analysis is highly dependent on context and precursory data. In Lugano, networks have been instructed to memorise the order of a chain of events. Long Short Term Memory (LSTM) networks can distinguish ‘boat’ from ‘float’ by recalling the sound that preceded ‘oat’ (i.e., either ‘b’ or ‘fl’). “Recurrent neural networks are more powerful than other approaches such as the Hidden Markov models”, says Schmidhuber, who also notes that Google Voice integrated LSTMs in 2015. “With looped networks, the number of layers is potentially infinite”, says Faltings [?].

For Schmidhuber, deep learning is just one aspect of artificial intelligence; the real thing will lead to “the most important change in the history of our civilisation”. But Marchand-Maillet sees deep learning as “a bit of hype, leading us to believe that artificial intelligence can learn anything provided there’s data. But it’s still an open question as to whether deep learning can really be applied to every last domain”.

It’s nice to get an historical perspective and eye-opening to realize that scientists have been working on these concepts since the 1950s.