It took decades of painstaking research to map the structure of just 17 percent of the proteins used in the human body, but less than a year for UK-based artificial intelligence firm DeepMind to raise that number to 98.5 percent. The company provides all of this data for free, which could lead to rapid progress in developing new drugs.
Determination of the complex wrinkled shape of proteins Based on the sequence of amino acids they make has been a huge scientific hurdle. Some amino acids are attracted to others, some are repulsed by water, and the chains form complex shapes that are difficult to calculate accurately. Understanding these structures enables the design of new highly targeted drugs that bind to specific parts of proteins.
Genetic research has long provided the ability to determine the sequence of a protein, but an efficient method for finding the shape—and essential to understanding its properties—has proven elusive. Although supercomputers and distributed computing projects were effective, they failed to make much progress.
DeepMind published research last year that demonstrated that AI can quickly solve the problem. Its AlphaFold neural network was trained on sections of previously resolved protein motifs and learned to infer the structure of new sequences, which were then checked against experimental data.
Since then, the company has been applying and improving the technology to thousands of proteins, ranging from human proteins to COVID-19-related proteins and others that will benefit immediate research. It is now releasing the results into a database created in partnership with the European Molecular Biology Laboratory.
DeepMind has mapped the structure of 98.5 percent of the 20,000 or so proteins in the human body. For 35.7 percent of those, the algorithm gave over 90 percent confidence in predicting their shape.
The company has released more than 350,000 protein structure predictions in total, including predictions for 20 additional model organisms important for biological research, from Escherichia coli to yeast. The team hopes that within months they will be able to add nearly every protein sequence known to science – more than 100 million structures.
The emergence of AI in protein folding, says John Molt of the University of Maryland, was a “profound surprise”.
“It’s revolutionary in the sense that it’s hard to get hold of your mind,” he says. “If you’ve been working on some rare disease and have never had a skeleton, now you’ll be able to go and look at skeletal information that was basically very difficult or impossible to get before.”
Demis Hassabis, CEO and founder of DeepMind, says AlphaFold — which is made up of about 32 separate algorithms and made open source — now solves protein shapes in minutes or, in some cases, seconds using hardware no more complex than a standard graphics card.
It takes one واحد [graphics processing unit] A few minutes to fold one protein, which of course would have taken years of experimental work,” he says. “We’re going to put this treasure trove of data out there. It’s rather amazing because going from hacking into creating a system that could do that to actually producing all the data was only a few months away. We hope that it will become a kind of standard tool used by all biologists around the world.”
The team also added a measure of confidence to all of the structure’s predictions, which Hasbis says he felt was vital because the results will be the basis of the research effort. Hassabis believes that some parts of human proteins in which the predicted structure had lower confidence scores could be due to sequencing errors or perhaps “something intrinsic in biology,” such as proteins that are inherently disordered or unpredictable. The remaining 1.5 percent of the human protein for which no structure has been published were proteins with sequences longer than 2,700 segments, which are currently excluded to reduce runtime.
Journal reference: nature, DOI: https://www.nature.com/articles/s41586-021-03828-1
More on these topics: