Home / Blog / It took AI to solve protein folding

It took AI to solve protein folding

Artistic rendering of various proteins on a T-cell. Their shape (folding) is what gives them their properties. Finding their shape is one of the most challenging problem of these last 50 years. Image credit: Juan Gaertner – Science Photo Library, via Getty Images

Chemistry has a lot to do with geometry, particularly when we look at the chemistry of life. We have learnt a bit about it during the pandemic: the Covid-19 virus has a protein, the spike protein, made up by 1,273 amino acid folded into a 3d structure kept in place by 23 sugar molecules. The shape of this spike protein is able to create a breach in our cells’ membrane letting the virus infect it. It is not the chemical composition that does the trick, it is its shape. If it were to have a different shape most likely it would not be able to enter the cells. The vaccine target that shape teaching our immune system to recognise it and fight it.
Protein shapes are crucial because they result in the binding with other molecules and in the macro effects that we perceive. Knowing the composition of a protein, that is the amino acid composing it, is something we have learnt over two hundreds years of organic chemistry. However, knowing the shape of a protein is so much more complex. It requires plenty of geometry and physics, so much in fact that we can’t do with paper and pencil: we need a computer.

The Rosetta project (remember the Rosetta stone allowed the decoding of the Egyptian hieroglyphs) started in 2005 proposed by the Baker laboratory at the University of Washington and called for a broad cooperation to pull computing resources to calculate the shape of proteins. It is still running as Rosetta Commons and it has been able to enrol processing capacity of private -residential- computers (as of September 2020 the average processing power shared through the [email protected] was 487,946 Giga FLOPS! During the Covid-19 pandemic it peaked at 1.7 PetaFFLOPS on March 28th focussing on the spike protein folding).

Lately, Google has enrolled AI to address the discovery of protein folding (project AlphaFold) and now DeepMind, the Google company that exploits AI in various fields, has announced they will soon release the full data base of the 100 million proteins known to exist along with their shape. The shape has been discovered using artificial intelligence (watch the clip).

The first published results of AlphaFold 2, the new version of AlphaFold with a progressively self-enhancing AI, has shown an accuracy in folding prediction up to atomic level (the location of a specific atom in relation to all others in a 3D space) of 36%, increasing to over 50% if one is accepting a precision that is suitable for evaluating the functionality of the protein (and this is the one that is of interest in designing new drugs and vaccine).

Notice that AI “predicts” the folding, then laboratory analyses have to take place to confirm (or disprove) it. This is no small feat and it is an enormous help to biologists since it is much easier to confirm a “folding” than to discover it.

We can expect a significant acceleration in the second part of this decade in terms of drugs creations. This will be a required technology for moving into personalised medicine with drugs designed from the specific genome of a person.

About Roberto Saracco

Roberto Saracco fell in love with technology and its implications long time ago. His background is in math and computer science. Until April 2017 he led the EIT Digital Italian Node and then was head of the Industrial Doctoral School of EIT Digital up to September 2018. Previously, up to December 2011 he was the Director of the Telecom Italia Future Centre in Venice, looking at the interplay of technology evolution, economics and society. At the turn of the century he led a World Bank-Infodev project to stimulate entrepreneurship in Latin America. He is a senior member of IEEE where he leads the New Initiative Committee and co-chairs the Digital Reality Initiative. He is a member of the IEEE in 2050 Ad Hoc Committee. He teaches a Master course on Technology Forecasting and Market impact at the University of Trento. He has published over 100 papers in journals and magazines and 14 books.