University of Maryland researchers developed an abstract language that specifies the various shapes a protein molecule can take as well as how and when it transitions from one shape to another by using natural language processing methods to analyze the motions of protein molecules.
Understanding the dynamics that influence shape and structure can help us better understand everything from how proteins function to the origins of disease and how to create effective targeted medication therapies. A protein’s activity is frequently determined by its shape and structure.
The method’s success offers insights that can further artificial intelligence (AI) since this is the first time a machine learning algorithm has been used to study biomolecular dynamics in this way. A research paper on this work was published on October 9, 2020, in the journal Nature Communications.
“Here we show the same AI architectures used to complete sentences when writing emails can be used to uncover a language spoken by the molecules of life,” said the paper’s senior author, Pratyush Tiwary, an assistant professor in UMD’s Department of Chemistry and Biochemistry and Institute for Physical Science and Technology.
“We show that the movement of these molecules can be mapped into an abstract language, and that AI techniques can be used to generate biologically truthful stories out of the resulting abstract words.”
The molecules that make up living things are continually moving and shifting. How they are folded and twisted determines how they are shaped. Before abruptly springing open and refolding into a different form or structure, they may stay in a certain shape for seconds or days.
It is natural to ask if there are governing physical principles making AI tools successful. Here we discover that, indeed, it is because the AI is learning path entropy. Now that we know this, it opens up more knobs and gears we can tune to do better AI for biology and perhaps, ambitiously, even improve AI itself. Anytime you understand a complex system such as AI, it becomes less of a black-box and gives you new tools for using it more effectively and reliably.Pratyush Tiwary
The change from one shape to another is similar to the gradual stretching of a twisted coil that opens. The molecule adopts many intermediate conformations as various coil components release and unfold.
It is challenging for experimental techniques like high-powered microscopes and spectroscopy to capture precisely how the unfolding happens, what parameters affect the unfolding, and what different shapes are possible because the transition from one form to another occurs in picoseconds (trillionths of a second) or faster. The answers to those questions form the biological story that Tiwary’s new method can reveal.
Tiwary and his team applied Newton’s laws of motion which can predict the movement of atoms within a molecule with powerful supercomputers, including UMD’s Deepthought2, to develop statistical physics models that simulate the shape, movement and trajectory of individual molecules.
These models were then fed into a machine learning algorithm, similar to the one used by Gmail to automatically complete words as you type. The system treated the simulations as if they were written in a language where each molecular movement represented a letter that could be combined with those of other molecules to generate words and sentences.
The program anticipates how the protein untangles as it changes shape and the diversity of forms it takes along the way by learning the principles of syntax and grammar that define which shapes and motions follow one another and which don’t.
The scientists used a tiny biomolecule called riboswitch, which has previously been studied using spectroscopy, to show that their method is effective. The outcomes, which showed the many configurations the riboswitch might adopt when stretched, were in line with those of the spectroscopic experiments.
“One of the most important uses of this, I hope, is to develop drugs that are very targeted,” Tiwary said. “You want to have potent drugs that bind very strongly, but only to the thing that you want them to bind to. We can achieve that if we can understand the different forms that a given biomolecule of interest can take, because we can make drugs that bind only to one of those specific forms at the appropriate time and only for as long as we want.”
The understanding of the language processing system utilized by Tiwary and his team, which is generally referred to as a recurrent neural network and in this particular case a long short-term memory network, is a crucial component of this research.
The network was trained to speak the language of molecular motion as the researchers examined the underlying mathematics. They discovered that the network followed rules that resembled path entropy, a crucial idea from statistical physics. Recognizing this creates possibilities for future advancements in recurrent neural networks.
“It is natural to ask if there are governing physical principles making AI tools successful,” Tiwary said.
“Here we discover that, indeed, it is because the AI is learning path entropy. Now that we know this, it opens up more knobs and gears we can tune to do better AI for biology and perhaps, ambitiously, even improve AI itself. Anytime you understand a complex system such as AI, it becomes less of a black-box and gives you new tools for using it more effectively and reliably.”