Case Study — Max-Planck-Institute
Breakthrough in Carbon Fixation
Improved GCC-Enzyme Performance with 2.8x higher Reaction Rate and 60% less Energy Consumption
Wet lab testing of 10 variants
The Challenge
Advancing Carbon Fixation in Photosynthesis by overcoming Limitations of Traditional Protein Engineering Methods
A) The Inherent Problem with Rubisco
Photosynthesis, crucial for life on Earth, relies on the Calvin cycle to convert carbon dioxide and water into glucose, catalyzed by the enzyme Rubisco. However, Rubisco also captures oxygen, triggering energy-intensive photorespiration and losing fixed carbon. This inefficiency limits plant growth and has broader environmental implications.
B) Innovating Beyond Nature: The TaCo Pathway
To overcome this, Prof. Tobias Erb's team developed the TaCo pathway, enhancing carbon fixation during photorespiration. They introduced the novel enzyme glycolyl-CoA carboxylase (GCC), derived from propionyl-CoA carboxylase (PCC). Despite achieving a 1000-fold improvement in catalytic activity, GCC was still energy-inefficient, requiring four ATP molecules per carboxylation event compared to PCC's one.
C) The Limitations of Traditional Methods
Initially, the team used directed evolution and error-prone PCR to create plasmid libraries with randomly mutated GCC M5 variants. However, after screening over 15,000 sequences, the desired improvements were not achieved. Recognizing these limitations, the group partnered with Exazyme to leverage machine learning (ML) for optimizing enzyme performance.
the approach
Streamlined Protein Engineering and Superior Results by Leveraging ML
A) Data and Training
We start by evaluating a targeted set of protein sequences through functional assays to identify areas for improvement. To build a robust training dataset, we revisited the directed evolution data, sequencing additional variants beyond the top performers. This resulted in a dataset of 161 variants, essential for training our ML pipeline.
B) Embedding
These protein sequences are transformed into numerical representations using embeddings like ProtBert or ESM. These embeddings convert biological data into vectors, capturing the core sequence information of proteins.
C) Regression
Encoded in a computational format, our predictive model correlates protein sequences directly with their functional attributes. By testing several ML approaches, including Gaussian Process and Unirep models, we identified the GP model as optimal for predicting carboxylation rates in GCC M5 variants.
D) Ranking
The prediction algorithm is integrated into a Bayesian optimization loop to rank GCC sequence candidates based on predicted performance. Structural criteria, including the type and location of substitutions near critical functional sites such as the active site and cofactor binding areas, further guided filtering. Subsequently, ten variants were selected for biochemical characterization in vitro.
The results
Exazyme’s variants show 2.8-fold higher carboxylation rate and 60% reduced energy demand
A) Breakthrough in Protein Performance
From our characterization of nine active variants, two emerged as exceptional. G20R showed a remarkable 2.8-fold increase in carboxylation rate, vastly accelerating glycolyl-CoA conversion. Despite matching GCC M5 in ATP consumption per carboxylation, G20R significantly boosts reaction efficiency. Meanwhile, L100N slashed ATP consumption by 60% per carboxylation event compared to GCC M5. While its specific activity is slightly lower, L100N's reduced energy demand enhances the TaCo pathway's efficiency as a photorespiratory bypass—a groundbreaking advancement in energy-efficient enzymatic processes.
B) Enhanced Efficiency and Success Rates
Training with 161 selected data points reduced the initial sequence space from 10,000 to 10 candidates for in vitro characterization. Screening these in cell lysate showed a 90% activity rate, with only M64R inactive—far surpassing traditional methods where less than 20% of variants were active. Of the nine active variants, two significantly boosted enzymatic activity and efficiency, while seven matched GCC M5. This 20% discovery rate for improved kinetics underscores ML's superiority over conventional screening, where less than 0.1% typically show enhanced properties.
C) Enhanced Structural Insights
Following detailed biochemical characterization, Cryo-EM structures of G20R and L100N were analyzed to uncover their catalytic enhancements. G20R in the β-subunit's GCC loop stabilizes interactions with the α-subunit, optimizing CoA positioning. L100N, positioned at the active site periphery, improves substrate orientation, minimizing carboxybiotin decarboxylation. This ML-driven approach accelerates structural analysis, providing rapid insights into protein mechanisms for efficient enzyme optimization.
The Results
Pushing the boundaries of protein engineering
Prof. Dr. Tobias Erb
Director Department of Biochemistry and Synthetic Metabolism,
Max Planck Institute