Strain Design by Wishful Thinking

Context
Economic challenge
One of the big challenges for precision fermentation that is shared globally is price parity, or technoeconomics. In the case of precision fermented (PF) food, 30% of US consumers would pay 25% more for PF products (Hartman Group, 2023). In Germany, 14% of consumers are willing to pay more for PF food (Kühl et al., 2024). In the UK, market share potential of precision fermented foods is projected to fall from 22% to 2% when price doubles over conventional products (Slade & Thomas, 2023).
Designing more productive strains is one crucial improvement required to improve the economics of PF. This means making more product with less feedstock (higher yield), in less time (higher productivity). This means higher titer (concentration) for a defined bioprocess time.
Strain design is complex due to multiple factors:
- unclear what the bottlenecks for titer are
- the bottlenecks to titer are distributed across 100s to 1000s of interacting cellular processes
- the total combination of gene edits is vast – infeasible to test all of them
- strain behavior varies from small to commerial reactors
Let's imagine we're focused on the first three challenges for now.
Strain design challenge
We want to design a microbe that expresses a protein product that gets secreted at high titers. We're allowed to make gene edits to the chassis strain, including gene knockouts, activation and repression. Let's assume the protein sequence is fixed, as we need to stay within the natural products space.
As a Computational Biologist, how would we address this design challenge?
We're essentially designing a program to execute an algorithm to design a strain. So, we can take inspiration from the programming world, which also develops complex programs with many moving parts and intricate code with complex layers of dependencies.
Computational Strain Design by Wishful Thinking
This approach is inspired by Programming by Wishful Thinking. Put simply: design your program imagining you already have any function you wish to solve your problem. This simplifies code that uses those functions, even if the functions themselves are complex (c2.com).
Despite the name, wishful thinking does not indicate lack of (scientific) rigor. Rather, it provides a useful framework to approach complex problems by clarifying logic flow before implementing the technical details. It ensures we've verified the outcomes we want first, and then choose the best way to achieve these outcomes.
This is a useful way to design strains with code. Here's how we can apply this framework to our strain design challenge.
Start from the End
def suggest_strain_modifications(protein, organism):
""" Ultimately, we want strain design recommendations. """
Ultimately, we want the AI to recommend how to optimally modify our strain to maximize protein titer.
Next, let's call on required functions, imagining they're already implemented.
Implement your strain design approach
We'll use a simple iterative approach: make several gene edits, culture the strain, measure protein titer. Then, repeat by adding or removing gene edits in each round until we can't improve titer further. It's essentially a trial and error approach but sped up 1,000x since we're simulating everything on a computer.
Here's a barebones implementation:
def suggest_strain_modifications(protein, organism_model, maxIter=100):
""" Ultimately, we want strain design recommendations. """
# Initial titer is zero
final_titer = 0
# No gene edits initially
final_gene_edits = []
# Iterate until max iterations
while iter < maxIter:
# Predict titer and suggest gene edits
titer = predict_titer(protein, organism_model, gene_edits)
gene_edits = suggest_gene_edits_for_max_titer(
current_edits=gene_edits, number_of_edits=3)
if new_titer > previous_titer:
# Keep new design if better than previous one
final_titer = titer
final_gene_edits = gene_edits
else:
# If no improvement found, return best strategy so far
return gene_edits, final_titer
return final_gene_edits, final_titer
Wishful functions to fill later
The implementation above calls key functions, which we define at a high level but defer the details through comments.
def predict_titer(protein, organism_model, gene_edits):
""" Will need:
- simulate titer given protein sequence and organism model
- simulate protein expression
- simulate protein secretion / translocation
- account for gene edits to modify organism model
- dynamic simulation to predict titer (not just yield)
"""
pass
def suggest_gene_edits_for_max_titer(current_edits, number_of_edits):
""" Will need:
- optimization method with gene edits as decision variables
- number of edits allowed
"""
pass
Next steps
Now that the logic flow and design approach is defined, here are the next steps.
- Implement the core functions
- Add supporting functions
- Verify everything with test data
- Run program with real data
When implementing the core functions, you have a choice of simulators and optimization algorithms. This ranges from flux balance analysis, multi-scale simulators, dynamic bioprocess simulation, data-driven surrogate machine learning models, bilevel optimization algorithms for strain design, and so much more.
We'll discuss these points in a subsequent article.
Sidenotes
- the trial-and-error approach can be improved by formulating it as one big optimization problem. For many gene edits (10 or more), we'd likely need to use scalable methods like iterative local search (Lun et al., 2009) or linearization techniques (Yang et al., 2011).
- simulating protein expression and secretion can be achieved using multiscale metabolism and macromolecular expression (ME) models (Lloyd et al., 2018).