As the amount of information about chemicals and molecules continues to explode, scientists at the US National Institute of Standards and Technology (NIST) have devised a type of ‘Facebook for molecules’ which should make the process of searching through the data much easier according to a July 18, 2013 news item on ScienceDaily,
Social media has expanded to reach an unlikely new target: molecules. Scientists at the National Institute of Standards and Technology (NIST) have created networks of molecular data similar to Facebook’s recently debuted graph search feature. While graph search would allow Facebook users to find all their New York-living, beer-drinking buddies in one quick search, the NIST-designed networks could help scientists rapidly sift through enormous chemical and biological data sets to find substances with specific properties, for example all 5-ring chemicals with an affinity for enzyme A. The search approach could help speed up the development of new drugs and designer materials.
There are vocabulary issues associated with creating a search function (from the news item),
Molecules don’t maintain their own online profiles, so a key challenge for the NIST research team was to develop a standard language for scientists to describe their research subjects. For example, one research group may describe a material’s properties as glassy while another team might use the word vitreous, even though the two words have the same meaning, explained Ursula Kattner, a researcher in the Materials Science and Engineering Division at NIST.
One approach to the problem could be to define a standard set of words, but NIST scientists opted for a more flexible approach that could evolve with time. The search language they developed is similar to Indo-European languages like Sanskrit and Latin, which use short roots to build words based on a set of rules, said Talapady Bhat, a research chemist at NIST who has been leading the effort to develop a shared vocabulary for NIST’s scientific databases. He gives the example of the Sanskrit word “yoga,” which is based on the roots “Y(uj),” which means to join, “O,” which means creator, God, or brain, and “Ga,” which means motion or initiation. Similarly, scientists could take the three simple root words “red,” “laser,” and “light,” and combine them into a single compound word “red-laser-light” that conveys a new concept. Using the root and rule-based approach will mean that scientists who know the roots can figure out the meaning of unfamiliar terms, and it also gives scientists flexibility to develop easily understandable new terms in the future.
The NIST team has already applied their root-based vocabulary rules to the chemical structures in PubChem, a “monstrous database” of millions of compounds and chemical substances, to the world wide protein data bank (PDB), and to specific NIST-based databases, said John Elliot, a biophysicist and another member of the team. While the scientific databases haven’t reached a Facebook-like level of more than a billion users, they are actively used by many scientists in the NIST community and beyond.
You can read more about the issues associated with getting precise search results on ScienceDaily and you may be able to access an abstract of the researchers’ (Talapady Bhat , John Elliott, Carelyn Campbell, Ursula Kattner, Shir Boger, Anne Plant) Challenges and Solutions for Enabling Facebook like Graph-search on Small and Macro-molecular Structural Data presentation (I keep getting an error) which was given at the 2013 American Crystallographic Association (ACA) meeting.