Open data key to solving the protein structure prediction problem

Human protein TRPC5

Thousands of AI-predicted protein structures join experimentally determined collection in data sharing resources.

Proteins are the building blocks for all living things, providing structure and managing processes in cells. Understanding how proteins fold into three dimensional (3D) shapes is important to understanding their function. But it needs expensive equipment and lots of time, limiting research and development progress.

Trying to predict a protein’s 3D structure from its amino acid sequence is a challenge that’s been with us for 50 years. A new artificial intelligence programme called AlphaFold has managed to accurately predict protein structure in minutes, solving this decades-old challenge.

DeepMind, the developers of AlphaFold, have made the AlphaFold code and protein structure predictions openly available to the global scientific community. This could mark a major change for both fundamental research and a range of applications – developing new drugs, designing crops resistant to climate change and advancing bio-based technologies.

About the project

The success of AlphaFold is built on the availability of thousands of experimentally determined protein structures, which is a result of long-term research funding, infrastructure investment and data-sharing policies supported by funders and journals.

There is great benefit in sharing experimentally determined structures, rather than researchers and innovators having to replicate work independently.

To meet the needs of this open and efficient approach, UKRI is supporting world-leading infrastructures for biological information through the Biotechnology and Biological Sciences Research Council (BBSRC) and the Medical Research Council (MRC), along with the Wellcome Trust. The resources developed include:

  • the European Molecular Biology Laboratory-European Bioinformatics Institute’s (EMBL-EBI) Protein Data Bank in Europe (PDBe)
  • University College London’s protein structure classification database Class, Architecture, Topology, Homologous superfamily (CATH).

The EMBL-EBI Protein Data Bank in Europe is a repository for experimentally determined protein structures. DeepMind made the AlphaFold code and protein structure predictions openly available through EMBL-EBI, unlocking new research opportunities.

The CATH database categorises those protein structures, using common folding sections called domains to identify evolutionary relationships.

Impacts of the project

Thousands of new predicted structures are now openly available to the research and development community through a collaboration between EMBL-EBI and DeepMind, unlocking new research opportunities.

The software has already been used in the search for enzymes that could recycle single-use plastics. Future applications could include understanding disease, designing climate change resilient crops, accelerating drug development, and tackling antimicrobial resistance.

The AlphaFold protein structure predictions data will allow CATH to treble the number of protein structures for human proteins and get a much more complete and accurate picture of the functional impacts caused by the mutations. These mutations could be influencing disease progression and drug resistance.

“This will be one of the most important datasets since the mapping of the Human Genome,” said Ewan Birney, Director at EMBL-EBI. “Making AlphaFold predictions accessible to the international scientific community opens up so many new research avenues, from neglected diseases to new enzymes for biotechnology and everything in between.”

Find out more

Great expectations – the potential impacts of AlphaFold database on the EMBL website.

BBSRC data sharing policy.

Information on the evolutionary relationships of protein domains on the CATH database website.

Top image:  Structure of a human protein used to transport calcium across membranes in the nervous system in complex with an inhibitor. Credit: Protein Data Bank in Europe.

This is the website for UKRI: our seven research councils, Research England and Innovate UK. Let us know if you have feedback or would like to help improve our online products and services.