Tag Archives: perceptron

An explanation of neural networks from the Massachusetts Institute of Technology (MIT)

I always enjoy the MIT ‘explainers’ and have been a little sad that I haven’t stumbled across one in a while. Until now, that is. Here’s an April 14, 201 neural network ‘explainer’ (in its entirety) by Larry Hardesty (?),

In the past 10 years, the best-performing artificial-intelligence systems — such as the speech recognizers on smartphones or Google’s latest automatic translator — have resulted from a technique called “deep learning.”

Deep learning is in fact a new name for an approach to artificial intelligence called neural networks, which have been going in and out of fashion for more than 70 years. Neural networks were first proposed in 1944 by Warren McCullough and Walter Pitts, two University of Chicago researchers who moved to MIT in 1952 as founding members of what’s sometimes called the first cognitive science department.

Neural nets were a major area of research in both neuroscience and computer science until 1969, when, according to computer science lore, they were killed off by the MIT mathematicians Marvin Minsky and Seymour Papert, who a year later would become co-directors of the new MIT Artificial Intelligence Laboratory.

The technique then enjoyed a resurgence in the 1980s, fell into eclipse again in the first decade of the new century, and has returned like gangbusters in the second, fueled largely by the increased processing power of graphics chips.

“There’s this idea that ideas in science are a bit like epidemics of viruses,” says Tomaso Poggio, the Eugene McDermott Professor of Brain and Cognitive Sciences at MIT, an investigator at MIT’s McGovern Institute for Brain Research, and director of MIT’s Center for Brains, Minds, and Machines. “There are apparently five or six basic strains of flu viruses, and apparently each one comes back with a period of around 25 years. People get infected, and they develop an immune response, and so they don’t get infected for the next 25 years. And then there is a new generation that is ready to be infected by the same strain of virus. In science, people fall in love with an idea, get excited about it, hammer it to death, and then get immunized — they get tired of it. So ideas should have the same kind of periodicity!”

Weighty matters

Neural nets are a means of doing machine learning, in which a computer learns to perform some task by analyzing training examples. Usually, the examples have been hand-labeled in advance. An object recognition system, for instance, might be fed thousands of labeled images of cars, houses, coffee cups, and so on, and it would find visual patterns in the images that consistently correlate with particular labels.

Modeled loosely on the human brain, a neural net consists of thousands or even millions of simple processing nodes that are densely interconnected. Most of today’s neural nets are organized into layers of nodes, and they’re “feed-forward,” meaning that data moves through them in only one direction. An individual node might be connected to several nodes in the layer beneath it, from which it receives data, and several nodes in the layer above it, to which it sends data.

To each of its incoming connections, a node will assign a number known as a “weight.” When the network is active, the node receives a different data item — a different number — over each of its connections and multiplies it by the associated weight. It then adds the resulting products together, yielding a single number. If that number is below a threshold value, the node passes no data to the next layer. If the number exceeds the threshold value, the node “fires,” which in today’s neural nets generally means sending the number — the sum of the weighted inputs — along all its outgoing connections.

When a neural net is being trained, all of its weights and thresholds are initially set to random values. Training data is fed to the bottom layer — the input layer — and it passes through the succeeding layers, getting multiplied and added together in complex ways, until it finally arrives, radically transformed, at the output layer. During training, the weights and thresholds are continually adjusted until training data with the same labels consistently yield similar outputs.

Minds and machines

The neural nets described by McCullough and Pitts in 1944 had thresholds and weights, but they weren’t arranged into layers, and the researchers didn’t specify any training mechanism. What McCullough and Pitts showed was that a neural net could, in principle, compute any function that a digital computer could. The result was more neuroscience than computer science: The point was to suggest that the human brain could be thought of as a computing device.

Neural nets continue to be a valuable tool for neuroscientific research. For instance, particular network layouts or rules for adjusting weights and thresholds have reproduced observed features of human neuroanatomy and cognition, an indication that they capture something about how the brain processes information.

The first trainable neural network, the Perceptron, was demonstrated by the Cornell University psychologist Frank Rosenblatt in 1957. The Perceptron’s design was much like that of the modern neural net, except that it had only one layer with adjustable weights and thresholds, sandwiched between input and output layers.

Perceptrons were an active area of research in both psychology and the fledgling discipline of computer science until 1959, when Minsky and Papert published a book titled “Perceptrons,” which demonstrated that executing certain fairly common computations on Perceptrons would be impractically time consuming.

“Of course, all of these limitations kind of disappear if you take machinery that is a little more complicated — like, two layers,” Poggio says. But at the time, the book had a chilling effect on neural-net research.

“You have to put these things in historical context,” Poggio says. “They were arguing for programming — for languages like Lisp. Not many years before, people were still using analog computers. It was not clear at all at the time that programming was the way to go. I think they went a little bit overboard, but as usual, it’s not black and white. If you think of this as this competition between analog computing and digital computing, they fought for what at the time was the right thing.”


By the 1980s, however, researchers had developed algorithms for modifying neural nets’ weights and thresholds that were efficient enough for networks with more than one layer, removing many of the limitations identified by Minsky and Papert. The field enjoyed a renaissance.

But intellectually, there’s something unsatisfying about neural nets. Enough training may revise a network’s settings to the point that it can usefully classify data, but what do those settings mean? What image features is an object recognizer looking at, and how does it piece them together into the distinctive visual signatures of cars, houses, and coffee cups? Looking at the weights of individual connections won’t answer that question.

In recent years, computer scientists have begun to come up with ingenious methods for deducing the analytic strategies adopted by neural nets. But in the 1980s, the networks’ strategies were indecipherable. So around the turn of the century, neural networks were supplanted by support vector machines, an alternative approach to machine learning that’s based on some very clean and elegant mathematics.

The recent resurgence in neural networks — the deep-learning revolution — comes courtesy of the computer-game industry. The complex imagery and rapid pace of today’s video games require hardware that can keep up, and the result has been the graphics processing unit (GPU), which packs thousands of relatively simple processing cores on a single chip. It didn’t take long for researchers to realize that the architecture of a GPU is remarkably like that of a neural net.

Modern GPUs enabled the one-layer networks of the 1960s and the two- to three-layer networks of the 1980s to blossom into the 10-, 15-, even 50-layer networks of today. That’s what the “deep” in “deep learning” refers to — the depth of the network’s layers. And currently, deep learning is responsible for the best-performing systems in almost every area of artificial-intelligence research.

Under the hood

The networks’ opacity is still unsettling to theorists, but there’s headway on that front, too. In addition to directing the Center for Brains, Minds, and Machines (CBMM), Poggio leads the center’s research program in Theoretical Frameworks for Intelligence. Recently, Poggio and his CBMM colleagues have released a three-part theoretical study of neural networks.

The first part, which was published last month in the International Journal of Automation and Computing, addresses the range of computations that deep-learning networks can execute and when deep networks offer advantages over shallower ones. Parts two and three, which have been released as CBMM technical reports, address the problems of global optimization, or guaranteeing that a network has found the settings that best accord with its training data, and overfitting, or cases in which the network becomes so attuned to the specifics of its training data that it fails to generalize to other instances of the same categories.

There are still plenty of theoretical questions to be answered, but CBMM researchers’ work could help ensure that neural networks finally break the generational cycle that has brought them in and out of favor for seven decades.

This image from MIT illustrates a ‘modern’ neural network,

Most applications of deep learning use “convolutional” neural networks, in which the nodes of each layer are clustered, the clusters overlap, and each cluster feeds data to multiple nodes (orange and green) of the next layer. Image: Jose-Luis Olivares/MIT

h/t phys.org April 17, 2017

One final note, I wish the folks at MIT had an ‘explainer’ archive. I’m not sure how to find any more ‘explainers on MIT’s website.

Plastic memristors for neural networks

There is a very nice explanation of memristors and computing systems from the Moscow Institute of Physics and Technology (MIPT). First their announcement, from a Jan. 27, 2016 news item on ScienceDaily,

A group of scientists has created a neural network based on polymeric memristors — devices that can potentially be used to build fundamentally new computers. These developments will primarily help in creating technologies for machine vision, hearing, and other machine sensory systems, and also for intelligent control systems in various fields of applications, including autonomous robots.

The authors of the new study focused on a promising area in the field of memristive neural networks – polymer-based memristors – and discovered that creating even the simplest perceptron is not that easy. In fact, it is so difficult that up until the publication of their paper in the journal Organic Electronics, there were no reports of any successful experiments (using organic materials). The experiments conducted at the Nano-, Bio-, Information and Cognitive Sciences and Technologies (NBIC) centre at the Kurchatov Institute by a joint team of Russian and Italian scientists demonstrated that it is possible to create very simple polyaniline-based neural networks. Furthermore, these networks are able to learn and perform specified logical operations.

A Jan. 27, 2016 MIPT press release on EurekAlert, which originated the news item, offers an explanation of memristors and a description of the research,

A memristor is an electric element similar to a conventional resistor. The difference between a memristor and a traditional element is that the electric resistance in a memristor is dependent on the charge passing through it, therefore it constantly changes its properties under the influence of an external signal: a memristor has a memory and at the same time is also able to change data encoded by its resistance state! In this sense, a memristor is similar to a synapse – a connection between two neurons in the brain that is able, with a high level of plasticity, to modify the efficiency of signal transmission between neurons under the influence of the transmission itself. A memristor enables scientists to build a “true” neural network, and the physical properties of memristors mean that at the very minimum they can be made as small as conventional chips.

Some estimates indicate that the size of a memristor can be reduced up to ten nanometers, and the technologies used in the manufacture of the experimental prototypes could, in theory, be scaled up to the level of mass production. However, as this is “in theory”, it does not mean that chips of a fundamentally new structure with neural networks will be available on the market any time soon, even in the next five years.

The plastic polyaniline was not chosen by chance. Previous studies demonstrated that it can be used to create individual memristors, so the scientists did not have to go through many different materials. Using a polyaniline solution, a glass substrate, and chromium electrodes, they created a prototype with dimensions that, at present, are much larger than those typically used in conventional microelectronics: the strip of the structure was approximately one millimeter wide (they decided to avoid miniaturization for the moment). All of the memristors were tested for their electrical characteristics: it was found that the current-voltage characteristic of the devices is in fact non-linear, which is in line with expectations. The memristors were then connected to a single neuromorphic network.

A current-voltage characteristic (or IV curve) is a graph where the horizontal axis represents voltage and the vertical axis the current. In conventional resistance, the IV curve is a straight line; in strict accordance with Ohm’s Law, current is proportional to voltage. For a memristor, however, it is not just the voltage that is important, but the change in voltage: if you begin to gradually increase the voltage supplied to the memristor, it will increase the current passing through it not in a linear fashion, but with a sharp bend in the graph and at a certain point its resistance will fall sharply.

Then if you begin to reduce the voltage, the memristor will remain in its conducting state for some time, after which it will change its properties rather sharply again to decrease its conductivity. Experimental samples with a voltage increase of 0.5V hardly allowed any current to pass through (around a few tenths of a microamp), but when the voltage was reduced by the same amount, the ammeter registered a figure of 5 microamps. Microamps are of course very small units, but in this case it is the contrast that is most significant: 0.1 μA to 5 μA is a difference of fifty times! This is more than enough to make a clear distinction between the two signals.

After checking the basic properties of individual memristors, the physicists conducted experiments to train the neural network. The training (it is a generally accepted term and is therefore written without inverted commas) involves applying electric pulses at random to the inputs of a perceptron. If a certain combination of electric pulses is applied to the inputs of a perceptron (e.g. a logic one and a logic zero at two inputs) and the perceptron gives the wrong answer, a special correcting pulse is applied to it, and after a certain number of repetitions all the internal parameters of the device (namely memristive resistance) reconfigure themselves, i.e. they are “trained” to give the correct answer.

The scientists demonstrated that after about a dozen attempts their new memristive network is capable of performing NAND logical operations, and then it is also able to learn to perform NOR operations. Since it is an operator or a conventional computer that is used to check for the correct answer, this method is called the supervised learning method.

Needless to say, an elementary perceptron of macroscopic dimensions with a characteristic reaction time of tenths or hundredths of a second is not an element that is ready for commercial production. However, as the researchers themselves note, their creation was made using inexpensive materials, and the reaction time will decrease as the size decreases: the first prototype was intentionally enlarged to make the work easier; it is physically possible to manufacture more compact chips. In addition, polyaniline can be used in attempts to make a three-dimensional structure by placing the memristors on top of one another in a multi-tiered structure (e.g. in the form of random intersections of thin polymer fibers), whereas modern silicon microelectronic systems, due to a number of technological limitations, are two-dimensional. The transition to the third dimension would potentially offer many new opportunities.

The press release goes to explain what the researchers mean when they mention a fundamentally different computer,

The common classification of computers is based either on their casing (desktop/laptop/tablet), or on the type of operating system used (Windows/MacOS/Linux). However, this is only a very simple classification from a user perspective, whereas specialists normally use an entirely different approach – an approach that is based on the principle of organizing computer operations. The computers that we are used to, whether they be tablets, desktop computers, or even on-board computers on spacecraft, are all devices with von Neumann architecture; without going into too much detail, they are devices based on independent processors, random access memory (RAM), and read only memory (ROM).

The memory stores the code of a program that is to be executed. A program is a set of instructions that command certain operations to be performed with data. Data are also stored in the memory* and are retrieved from it (and also written to it) in accordance with the program; the program’s instructions are performed by the processor. There may be several processors, they can work in parallel, data can be stored in a variety of ways – but there is always a fundamental division between the processor and the memory. Even if the computer is integrated into one single chip, it will still have separate elements for processing information and separate units for storing data. At present, all modern microelectronic systems are based on this particular principle and this is partly the reason why most people are not even aware that there may be other types of computer systems – without processors and memory.

*) if physically different elements are used to store data and store a program, the computer is said to be built using Harvard architecture. This method is used in certain microcontrollers, and in small specialized computing devices. The chip that controls the function of a refrigerator, lift, or car engine (in all these cases a “conventional” computer would be redundant) is a microcontroller. However, neither Harvard, nor von Neumann architectures allow the processing and storage of information to be combined into a single element of a computer system.

However, such systems do exist. Furthermore, if you look at the brain itself as a computer system (this is purely hypothetical at the moment: it is not yet known whether the function of the brain is reducible to computations), then you will see that it is not at all built like a computer with von Neumann architecture. Neural networks do not have a specialized computer or separate memory cells. Information is stored and processed in each and every neuron, one element of the computer system, and the human brain has approximately 100 billion of these elements. In addition, almost all of them are able to work in parallel (simultaneously), which is why the brain is able to process information with great efficiency and at such high speed. Artificial neural networks that are currently implemented on von Neumann computers only emulate these processes: emulation, i.e. step by step imitation of functions inevitably leads to a decrease in speed and an increase in energy consumption. In many cases this is not so critical, but in certain cases it can be.

Devices that do not simply imitate the function of neural networks, but are fundamentally the same could be used for a variety of tasks. Most importantly, neural networks are capable of pattern recognition; they are used as a basis for recognising handwritten text for example, or signature verification. When a certain pattern needs to be recognised and classified, such as a sound, an image, or characteristic changes on a graph, neural networks are actively used and it is in these fields where gaining an advantage in terms of speed and energy consumption is critical. In a control system for an autonomous flying robot every milliwatt-hour and every millisecond counts, just in the same way that a real-time system to process data from a collider detector cannot take too long to “think” about highlighting particle tracks that may be of interest to scientists from among a large number of other recorded events.

Bravo to the writer!

Here’s a link to and a citation for the paper,

Hardware elementary perceptron based on polyaniline memristive devices by V.A. Demin. V. V. Erokhin, A.V. Emelyanov, S. Battistoni, G. Baldi, S. Iannotta, P.K. Kashkarov, M.V. Kovalchuk. Organic Electronics Volume 25, October 2015, Pages 16–20 doi:10.1016/j.orgel.2015.06.015

This paper is behind a paywall.