Case Study — Max-Planck-Institute

Breakthrough in Carbon Fixation

Improved GCC-Enzyme Performance with 2.8x higher Reaction Rate and 60% less Energy Consumption

Before Exazyme
Rational design and directed evolution
Wet lab testing of 15,000+ variants
~20% active variants
Enhanced properties ~1 in 8,000
After Exazyme
Algorithmic design and wet lab validation

Wet lab testing of 10 variants

90% active variants
Enhanced properties in 1 in 5

The Challenge

Advancing Carbon Fixation in Photosynthesis by overcoming Limitations of Traditional Protein Engineering Methods

A) The Inherent Problem with Rubisco
Photosynthesis, crucial for life on Earth, relies on the Calvin cycle to convert carbon dioxide and water into glucose, catalyzed by the enzyme Rubisco. However, Rubisco also captures oxygen, triggering energy-intensive photorespiration and losing fixed carbon. This inefficiency limits plant growth and has broader environmental implications.

B) Innovating Beyond Nature: The TaCo Pathway
To overcome this, Prof. Tobias Erb's team developed the TaCo pathway, enhancing carbon fixation during photorespiration. They introduced the novel enzyme glycolyl-CoA carboxylase (GCC), derived from propionyl-CoA carboxylase (PCC). Despite achieving a 1000-fold improvement in catalytic activity, GCC was still energy-inefficient, requiring four ATP molecules per carboxylation event compared to PCC's one.

C) The Limitations of Traditional Methods
Initially, the team used directed evolution and error-prone PCR to create plasmid libraries with randomly mutated GCC M5 variants. However, after screening over 15,000 sequences, the desired improvements were not achieved. Recognizing these limitations, the group partnered with Exazyme to leverage machine learning (ML) for optimizing enzyme performance.

“After several rounds of directed evolution we gave up on further improvement.”

the approach

Streamlined Protein Engineering and Superior Results by Leveraging ML


A) Data and Training
We start by evaluating a targeted set of protein sequences through functional assays to identify areas for improvement. To build a robust training dataset, we revisited the directed evolution data, sequencing additional variants beyond the top performers. This resulted in a dataset of 161 variants, essential for training our ML pipeline.


B) Embedding
These protein sequences are transformed into numerical representations using embeddings like ProtBert or ESM. These embeddings convert biological data into vectors, capturing the core sequence information of proteins.


C) Regression
Encoded in a computational format, our predictive model correlates protein sequences directly with their functional attributes. By testing several ML approaches, including Gaussian Process and Unirep models, we identified the GP model as optimal for predicting carboxylation rates in GCC M5 variants.


D) Ranking
The prediction algorithm is integrated into a Bayesian optimization loop to rank GCC sequence candidates based on predicted performance. Structural criteria, including the type and location of substitutions near critical functional sites such as the active site and cofactor binding areas, further guided filtering. Subsequently, ten variants were selected for biochemical characterization in vitro.

The results

Exazyme’s variants show 2.8-fold higher carboxylation rate and 60% reduced energy demand


A) Breakthrough in Protein Performance

From our characterization of nine active variants, two emerged as exceptional. G20R showed a remarkable 2.8-fold increase in carboxylation rate, vastly accelerating glycolyl-CoA conversion. Despite matching GCC M5 in ATP consumption per carboxylation, G20R significantly boosts reaction efficiency. Meanwhile, L100N slashed ATP consumption by 60% per carboxylation event compared to GCC M5. While its specific activity is slightly lower, L100N's reduced energy demand enhances the TaCo pathway's efficiency as a photorespiratory bypass—a groundbreaking advancement in energy-efficient enzymatic processes.

B) Enhanced Efficiency and Success Rates

Training with 161 selected data points reduced the initial sequence space from 10,000 to 10 candidates for in vitro characterization. Screening these in cell lysate showed a 90% activity rate, with only M64R inactive—far surpassing traditional methods where less than 20% of variants were active. Of the nine active variants, two significantly boosted enzymatic activity and efficiency, while seven matched GCC M5. This 20% discovery rate for improved kinetics underscores ML's superiority over conventional screening, where less than 0.1% typically show enhanced properties.

C) Enhanced Structural Insights

Following detailed biochemical characterization, Cryo-EM structures of G20R and L100N were analyzed to uncover their catalytic enhancements. G20R in the β-subunit's GCC loop stabilizes interactions with the α-subunit, optimizing CoA positioning. L100N, positioned at the active site periphery, improves substrate orientation, minimizing carboxybiotin decarboxylation. This ML-driven approach accelerates structural analysis, providing rapid insights into protein mechanisms for efficient enzyme optimization.

The Results

Pushing the boundaries of protein engineering

“We were amazed by the results. We were surprised to see such a strong performance jump with the variant Exazyme suggested."

Prof. Dr. Tobias Erb

Director Department of Biochemistry and Synthetic Metabolism,

Max Planck Institute

Contact us to see how we can support you in your protein optimization challenge!

This field is for validation purposes and should be left unchanged.