Tag Archives: Larry Hardesty

An explanation of neural networks from the Massachusetts Institute of Technology (MIT)

I always enjoy the MIT ‘explainers’ and have been a little sad that I haven’t stumbled across one in a while. Until now, that is. Here’s an April 14, 201 neural network ‘explainer’ (in its entirety) by Larry Hardesty (?),

In the past 10 years, the best-performing artificial-intelligence systems — such as the speech recognizers on smartphones or Google’s latest automatic translator — have resulted from a technique called “deep learning.”

Deep learning is in fact a new name for an approach to artificial intelligence called neural networks, which have been going in and out of fashion for more than 70 years. Neural networks were first proposed in 1944 by Warren McCullough and Walter Pitts, two University of Chicago researchers who moved to MIT in 1952 as founding members of what’s sometimes called the first cognitive science department.

Neural nets were a major area of research in both neuroscience and computer science until 1969, when, according to computer science lore, they were killed off by the MIT mathematicians Marvin Minsky and Seymour Papert, who a year later would become co-directors of the new MIT Artificial Intelligence Laboratory.

The technique then enjoyed a resurgence in the 1980s, fell into eclipse again in the first decade of the new century, and has returned like gangbusters in the second, fueled largely by the increased processing power of graphics chips.

“There’s this idea that ideas in science are a bit like epidemics of viruses,” says Tomaso Poggio, the Eugene McDermott Professor of Brain and Cognitive Sciences at MIT, an investigator at MIT’s McGovern Institute for Brain Research, and director of MIT’s Center for Brains, Minds, and Machines. “There are apparently five or six basic strains of flu viruses, and apparently each one comes back with a period of around 25 years. People get infected, and they develop an immune response, and so they don’t get infected for the next 25 years. And then there is a new generation that is ready to be infected by the same strain of virus. In science, people fall in love with an idea, get excited about it, hammer it to death, and then get immunized — they get tired of it. So ideas should have the same kind of periodicity!”

Weighty matters

Neural nets are a means of doing machine learning, in which a computer learns to perform some task by analyzing training examples. Usually, the examples have been hand-labeled in advance. An object recognition system, for instance, might be fed thousands of labeled images of cars, houses, coffee cups, and so on, and it would find visual patterns in the images that consistently correlate with particular labels.

Modeled loosely on the human brain, a neural net consists of thousands or even millions of simple processing nodes that are densely interconnected. Most of today’s neural nets are organized into layers of nodes, and they’re “feed-forward,” meaning that data moves through them in only one direction. An individual node might be connected to several nodes in the layer beneath it, from which it receives data, and several nodes in the layer above it, to which it sends data.

To each of its incoming connections, a node will assign a number known as a “weight.” When the network is active, the node receives a different data item — a different number — over each of its connections and multiplies it by the associated weight. It then adds the resulting products together, yielding a single number. If that number is below a threshold value, the node passes no data to the next layer. If the number exceeds the threshold value, the node “fires,” which in today’s neural nets generally means sending the number — the sum of the weighted inputs — along all its outgoing connections.

When a neural net is being trained, all of its weights and thresholds are initially set to random values. Training data is fed to the bottom layer — the input layer — and it passes through the succeeding layers, getting multiplied and added together in complex ways, until it finally arrives, radically transformed, at the output layer. During training, the weights and thresholds are continually adjusted until training data with the same labels consistently yield similar outputs.

Minds and machines

The neural nets described by McCullough and Pitts in 1944 had thresholds and weights, but they weren’t arranged into layers, and the researchers didn’t specify any training mechanism. What McCullough and Pitts showed was that a neural net could, in principle, compute any function that a digital computer could. The result was more neuroscience than computer science: The point was to suggest that the human brain could be thought of as a computing device.

Neural nets continue to be a valuable tool for neuroscientific research. For instance, particular network layouts or rules for adjusting weights and thresholds have reproduced observed features of human neuroanatomy and cognition, an indication that they capture something about how the brain processes information.

The first trainable neural network, the Perceptron, was demonstrated by the Cornell University psychologist Frank Rosenblatt in 1957. The Perceptron’s design was much like that of the modern neural net, except that it had only one layer with adjustable weights and thresholds, sandwiched between input and output layers.

Perceptrons were an active area of research in both psychology and the fledgling discipline of computer science until 1959, when Minsky and Papert published a book titled “Perceptrons,” which demonstrated that executing certain fairly common computations on Perceptrons would be impractically time consuming.

“Of course, all of these limitations kind of disappear if you take machinery that is a little more complicated — like, two layers,” Poggio says. But at the time, the book had a chilling effect on neural-net research.

“You have to put these things in historical context,” Poggio says. “They were arguing for programming — for languages like Lisp. Not many years before, people were still using analog computers. It was not clear at all at the time that programming was the way to go. I think they went a little bit overboard, but as usual, it’s not black and white. If you think of this as this competition between analog computing and digital computing, they fought for what at the time was the right thing.”


By the 1980s, however, researchers had developed algorithms for modifying neural nets’ weights and thresholds that were efficient enough for networks with more than one layer, removing many of the limitations identified by Minsky and Papert. The field enjoyed a renaissance.

But intellectually, there’s something unsatisfying about neural nets. Enough training may revise a network’s settings to the point that it can usefully classify data, but what do those settings mean? What image features is an object recognizer looking at, and how does it piece them together into the distinctive visual signatures of cars, houses, and coffee cups? Looking at the weights of individual connections won’t answer that question.

In recent years, computer scientists have begun to come up with ingenious methods for deducing the analytic strategies adopted by neural nets. But in the 1980s, the networks’ strategies were indecipherable. So around the turn of the century, neural networks were supplanted by support vector machines, an alternative approach to machine learning that’s based on some very clean and elegant mathematics.

The recent resurgence in neural networks — the deep-learning revolution — comes courtesy of the computer-game industry. The complex imagery and rapid pace of today’s video games require hardware that can keep up, and the result has been the graphics processing unit (GPU), which packs thousands of relatively simple processing cores on a single chip. It didn’t take long for researchers to realize that the architecture of a GPU is remarkably like that of a neural net.

Modern GPUs enabled the one-layer networks of the 1960s and the two- to three-layer networks of the 1980s to blossom into the 10-, 15-, even 50-layer networks of today. That’s what the “deep” in “deep learning” refers to — the depth of the network’s layers. And currently, deep learning is responsible for the best-performing systems in almost every area of artificial-intelligence research.

Under the hood

The networks’ opacity is still unsettling to theorists, but there’s headway on that front, too. In addition to directing the Center for Brains, Minds, and Machines (CBMM), Poggio leads the center’s research program in Theoretical Frameworks for Intelligence. Recently, Poggio and his CBMM colleagues have released a three-part theoretical study of neural networks.

The first part, which was published last month in the International Journal of Automation and Computing, addresses the range of computations that deep-learning networks can execute and when deep networks offer advantages over shallower ones. Parts two and three, which have been released as CBMM technical reports, address the problems of global optimization, or guaranteeing that a network has found the settings that best accord with its training data, and overfitting, or cases in which the network becomes so attuned to the specifics of its training data that it fails to generalize to other instances of the same categories.

There are still plenty of theoretical questions to be answered, but CBMM researchers’ work could help ensure that neural networks finally break the generational cycle that has brought them in and out of favor for seven decades.

This image from MIT illustrates a ‘modern’ neural network,

Most applications of deep learning use “convolutional” neural networks, in which the nodes of each layer are clustered, the clusters overlap, and each cluster feeds data to multiple nodes (orange and green) of the next layer. Image: Jose-Luis Olivares/MIT

h/t phys.org April 17, 2017

One final note, I wish the folks at MIT had an ‘explainer’ archive. I’m not sure how to find any more ‘explainers on MIT’s website.

Ear-powered batteries

According to the Nov. 7, 2012 news item on phys.org, the idea of powering batteries with vibrations from the inner ear is not new (Note: I have removed a link),

“In the past, people have thought that the space where the high potential is located is inaccessible for implantable devices, because potentially it’s very dangerous if you encroach on it,” Stankovic [Konstantina Stankovic, an otologic surgeon at MEEI {Massachusetts Eye and Ear Infirmary}]] says. “We have known for 60 years that this battery exists and that it’s really important for normal hearing, but nobody has attempted to use this battery to power useful electronics.”

Larry Hardesty’s Nov. 7, 2012 news release for the Massachusetts Institute of Technology (MIT), which originated the news item, provides more technical detail about how the researchers have reduced the risk associated with this type of  implant,

In experiments, Konstantina Stankovic, an otologic surgeon at MEEI, and HST [Harvard-MIT Division of Health Sciences and Technology] graduate student Andrew Lysaght implanted electrodes in the biological batteries in guinea pigs’ ears. Attached to the electrodes were low-power electronic devices developed by MIT’s Microsystems Technology Laboratories (MTL). After the implantation, the guinea pigs responded normally to hearing tests, and the devices were able to wirelessly transmit data about the chemical conditions of the ear to an external receiver.

The ear converts a mechanical force — the vibration of the eardrum — into an electrochemical signal that can be processed by the brain; the biological battery is the source of that signal’s current. Located in the part of the ear called the cochlea, the battery chamber is divided by a membrane, some of whose cells are specialized to pump ions. An imbalance of potassium and sodium ions on opposite sides of the membrane, together with the particular arrangement of the pumps, creates an electrical voltage.

Although the voltage is the highest in the body (outside of individual cells, at least), it’s still very low. Moreover, in order not to disrupt hearing, a device powered by the biological battery can harvest only a small fraction of its power. Low-power chips, however, are precisely the area of expertise of Anantha Chandrakasan’s group at MTL.

The MTL researchers — Chandrakasan, who heads MIT’s Department of Electrical Engineering and Computer Science; his former graduate student Patrick Mercier, who’s now an assistant professor at the University of California at San Diego; and Saurav Bandyopadhyay, a graduate student in Chandrakasan’s group — equipped their chip with an ultralow-power radio transmitter: After all, an implantable medical monitor wouldn’t be much use if there were no way to retrieve its measurements.

But while the radio is much more efficient than those found in cellphones, it still couldn’t run directly on the biological battery. So the MTL chip also includes power-conversion circuitry — like that in the boxy converters at the ends of many electronic devices’ power cables — that gradually builds up charge in a capacitor. The voltage of the biological battery fluctuates, but it would take the control circuit somewhere between 40 seconds and four minutes to amass enough charge to power the radio. The frequency of the signal was thus itself an indication of the electrochemical properties of the inner ear.

To reduce its power consumption, the control circuit had to be drastically simplified, but like the radio, it still required a higher voltage than the biological battery could provide. Once the control circuit was up and running, it could drive itself; the problem was getting it up and running.

The MTL researchers solve that problem with a one-time burst of radio waves. “In the very beginning, we need to kick-start it,” Chandrakasan says. “Once we do that, we can be self-sustaining. The control runs off the output.”

Stankovic, who still maintains an affiliation with HST, and Lysaght implanted electrodes attached to the MTL chip on both sides of the membrane in the biological battery of each guinea pig’s ear. In the experiments, the chip itself remained outside the guinea pig’s body, but it’s small enough to nestle in the cavity of the middle ear.

The researchers seem to think that this kind of device might be used as a monitor for people with hearing difficulties or balance problems or, even, to deliver therapies. Regardless of any possible future uses, we are still a long way from human clinical trials.

Magnetically cleaning up oil spills

Researchers at the Massachusetts Institute of Technology (MIT) have developed a promising technique for cleaning up oil spills, using magnets, which is more efficient and more environmentally friendly.

ETA Sept. 14, 2012: For some reason the embedded video keeps disappearing, so here’s the link: http://youtu.be/ZaP7XOjsCHQ

The Sept. 12, 2012 news item on Nanowerk notes,

The researchers will present their work at the International Conference on Magnetic Fluids in January. Shahriar Khushrushahi, a postdoc in MIT’s Department of Electrical Engineering and Computer Science, is lead author on the paper, joined by Markus Zahn, the Thomas and Gerd Perkins Professor of Electrical Engineering, and T. Alan Hatton, the Ralph Landau Professor of Chemical Engineering. The team has also filed two patents on its work.

In the MIT researchers’ scheme, water-repellent ferrous nanoparticles would be mixed with the oil, which could then be separated from the water using magnets. The researchers envision that the process would take place aboard an oil-recovery vessel, to prevent the nanoparticles from contaminating the environment. Afterward, the nanoparticles could be magnetically removed from the oil and reused.

Larry Hardesty’s Sept. 12, 2012 MIT news release , which originated the news item, provides detail about the standard technique for  using magnetic nanoparticles and the new technique,

According to Zahn, there’s a good deal of previous research on separating water and so-called ferrofluids — fluids with magnetic nanoparticles suspended in them. Typically, these involve pumping a water-and-ferrofluid mixture through a channel, while magnets outside the channel direct the flow of the ferrofluid, perhaps diverting it down a side channel or pulling it through a perforated wall.

This approach can work if the concentration of the ferrofluid is known in advance and remains constant. But in water contaminated by an oil spill, the concentration can vary widely. Suppose that the separation system consists of a branching channel with magnets along one side. If the oil concentration were zero, the water would naturally flow down both branches. By the same token, if the oil concentration is low, a lot of the water will end up flowing down the branch intended for the oil; if the oil concentration is high, a lot of the oil will end up flowing down the branch intended for the water.

The MIT researchers vary the conventional approach in two major ways: They orient their magnets perpendicularly to the flow of the stream, not parallel to it; and they immerse the magnets in the stream, rather than positioning them outside of it.

The magnets are permanent magnets, and they’re cylindrical. Because a magnet’s magnetic field is strongest at its edges, the tips of each cylinder attract the oil much more powerfully than its sides do. In experiments the MIT researchers conducted in the lab, the bottoms of the magnets were embedded in the base of a reservoir that contained a mixture of water and magnetic oil; consequently, oil couldn’t collect around them. The tops of the magnets were above water level, and the oil shot up the sides of the magnets, forming beaded spheres around the magnets’ ends.

The design is simple, but it provides excellent separation between oil and water. Moreover, Khushrushahi says, simplicity is an advantage in a system that needs to be manufactured on a large scale and deployed at sea for days or weeks, where electrical power is scarce and maintenance facilities limited

. …

In their experiments, the MIT researchers used a special configuration of magnets, called a Halbach array, to extract the oil from the tops of the cylindrical magnets. When attached to the cylinders, the Halbach array looks kind of like a model-train boxcar mounted on pilings. The magnets in a Halbach array are arranged so that on one side of the array, the magnetic field is close to zero, but on the other side, it’s roughly doubled. In the researchers’ experiments, the oil in the reservoir wasn’t attracted to the bottom of the array, but the top of the array pulled the oil off of the cylindrical magnets.

While this work is promising, there are still a lot of issues to be addressed including how water will be removed from the recovered oil (oil and water can mix to some degree depending on their relative densities).

Folding, origami, and shapeshifting and an article with over 50,000 authors

I’m on a metaphor kick these days so here goes, origami (Japanese paper folding), and shapeshifting are metaphors used to describe a certain biological process that nanoscientists from fields not necessarily associated with biology find fascinating, protein folding.


Take for example a research team at the California Institute of Technology (Caltech) working to exploit the electronic properties of carbon nanotubes (mentioned in a Nov. 9, 2010 news item on Nanowerk). One of the big issues is that since all of the tubes in a sample are made of carbon getting one tube to react on its own without activating the others is quite challenging when you’re trying to create nanoelectronic circuits. The research team decided to use a technique developed in a bioengineering lab (from the news item),

DNA origami is a type of self-assembled structure made from DNA that can be programmed to form nearly limitless shapes and patterns (such as smiley faces or maps of the Western Hemisphere or even electrical diagrams). Exploiting the sequence-recognition properties of DNA base paring, DNA origami are created from a long single strand of viral DNA and a mixture of different short synthetic DNA strands that bind to and “staple” the viral DNA into the desired shape, typically about 100 nanometers (nm) on a side.

Single-wall carbon nanotubes are molecular tubes composed of rolled-up hexagonal mesh of carbon atoms. With diameters measuring less than 2 nm and yet with lengths of many microns, they have a reputation as some of the strongest, most heat-conductive, and most electronically interesting materials that are known. For years, researchers have been trying to harness their unique properties in nanoscale devices, but precisely arranging them into desirable geometric patterns has been a major stumbling block.

… To integrate the carbon nanotubes into this system, the scientists colored some of those pixels anti-red, and others anti-blue, effectively marking the positions where they wanted the color-matched nanotubes to stick. They then designed the origami so that the red-labeled nanotubes would cross perpendicular to the blue nanotubes, making what is known as a field-effect transistor (FET), one of the most basic devices for building semiconductor circuits.

Although their process is conceptually simple, the researchers had to work out many kinks, such as separating the bundles of carbon nanotubes into individual molecules and attaching the single-stranded DNA; finding the right protection for these DNA strands so they remained able to recognize their partners on the origami; and finding the right chemical conditions for self-assembly.

After about a year, the team had successfully placed crossed nanotubes on the origami; they were able to see the crossing via atomic force microscopy. These systems were removed from solution and placed on a surface, after which leads were attached to measure the device’s electrical properties. When the team’s simple device was wired up to electrodes, it indeed behaved like a field-effect transistor


For another more recent example (from an August 5, 2010 article on physorg.com by Larry Hardesty,  Shape-shifting robots),

By combining origami and electrical engineering, researchers at MIT and Harvard are working to develop the ultimate reconfigurable robot — one that can turn into absolutely anything. The researchers have developed algorithms that, given a three-dimensional shape, can determine how to reproduce it by folding a sheet of semi-rigid material with a distinctive pattern of flexible creases. To test out their theories, they built a prototype that can automatically assume the shape of either an origami boat or a paper airplane when it receives different electrical signals. The researchers reported their results in the July 13 issue of the Proceedings of the National Academy of Sciences.

As director of the Distributed Robotics Laboratory at the Computer Science and Artificial Intelligence Laboratory (CSAIL), Professor Daniela Rus researches systems of robots that can work together to tackle complicated tasks. One of the big research areas in distributed robotics is what’s called “programmable matter,” the idea that small, uniform robots could snap together like intelligent Legos to create larger, more versatile robots.

Here’s a video from this site at MIT (Massachusetts Institute of Technology) describing the process,

Folding and over 50, 000 authors

With all this I’ve been leading up to a fascinating project, a game called Foldit, that a team from the University of Washington has published results from in the journal Nature (Predicting protein structures with a multiplayer online game), Aug. 5, 2010.

With over 50,000 authors, this study is a really good example of citizen science (discussed in my May 14, 2010 posting and elsewhere here) and how to use games to solve science problems while exploiting a fascination with folding and origami. From the Aug. 5, 2010 news item on Nanowerk,

The game, Foldit, turns one of the hardest problems in molecular biology into a game a bit reminiscent of Tetris. Thousands of people have now played a game that asks them to fold a protein rather than stack colored blocks or rescue a princess.

Scientists know the pieces that make up a protein but cannot predict how those parts fit together into a 3-D structure. And since proteins act like locks and keys, the structure is crucial.

At any moment, thousands of computers are working away at calculating how physical forces would cause a protein to fold. But no computer in the world is big enough, and computers may not take the smartest approach. So the UW team tried to make it into a game that people could play and compete. Foldit turns protein-folding into a game and awards points based on the internal energy of the 3-D protein structure, dictated by the laws of physics.

Tens of thousands of players have taken the challenge. The author list for the paper includes an acknowledgment of more than 57,000 Foldit players, which may be unprecedented on a scientific publication.

“It’s a new kind of collective intelligence, as opposed to individual intelligence, that we want to study,”Popoviç [principal investigator Zoran Popoviç, a UW associate professor of computer science and engineering] said. “We’re opening eyes in terms of how people think about human intelligence and group intelligence, and what the possibilities are when you get huge numbers of people together to solve a very hard problem.”

There’s a more at Nanowerk including a video about the gamers and the scientists. I think most of us take folding for granted and yet it stimulates all kinds of research and ideas.