Chemistry has a lot to do with geometry, particularly when we look at the chemistry of life. We have learnt a bit about it during the pandemic: the Covid-19 virus has a protein, the spike protein, made up by 1,273 amino acid folded into a 3d structure kept in place by 23 sugar molecules. The shape of this spike protein is able to create a breach in our cells’ membrane letting the virus infect it. It is not the chemical composition that does the trick, it is its shape. If it were to have a different shape most likely it would not be able to enter the cells. The vaccine target that shape teaching our immune system to recognise it and fight it.
Protein shapes are crucial because they result in the binding with other molecules and in the macro effects that we perceive. Knowing the composition of a protein, that is the amino acid composing it, is something we have learnt over two hundreds years of organic chemistry. However, knowing the shape of a protein is so much more complex. It requires plenty of geometry and physics, so much in fact that we can’t do with paper and pencil: we need a computer.
The Rosetta project (remember the Rosetta stone allowed the decoding of the Egyptian hieroglyphs) started in 2005 proposed by the Baker laboratory at the University of Washington and called for a broad cooperation to pull computing resources to calculate the shape of proteins. It is still running as Rosetta Commons and it has been able to enrol processing capacity of private -residential- computers (as of September 2020 the average processing power shared through the [email protected] was 487,946 Giga FLOPS! During the Covid-19 pandemic it peaked at 1.7 PetaFFLOPS on March 28th focussing on the spike protein folding).
Lately, Google has enrolled AI to address the discovery of protein folding (project AlphaFold) and now DeepMind, the Google company that exploits AI in various fields, has announced they will soon release the full data base of the 100 million proteins known to exist along with their shape. The shape has been discovered using artificial intelligence (watch the clip).
The first published results of AlphaFold 2, the new version of AlphaFold with a progressively self-enhancing AI, has shown an accuracy in folding prediction up to atomic level (the location of a specific atom in relation to all others in a 3D space) of 36%, increasing to over 50% if one is accepting a precision that is suitable for evaluating the functionality of the protein (and this is the one that is of interest in designing new drugs and vaccine).
Notice that AI “predicts” the folding, then laboratory analyses have to take place to confirm (or disprove) it. This is no small feat and it is an enormous help to biologists since it is much easier to confirm a “folding” than to discover it.
We can expect a significant acceleration in the second part of this decade in terms of drugs creations. This will be a required technology for moving into personalised medicine with drugs designed from the specific genome of a person.