You say poTAYto; I say poTAHto. Degenerate pronunciations didn’t prevent Fred Astaire and Ginger Rogers from tripping the light fantastic.
The degeneracy of the genetic code, however, with 61 possible codons specifying 20 different amino acids can trip up the best-designed synthesis schemes—strange DNA syntax in the form of a non-preferred codon within a coding sequence disrupts the dance of translation and can drastically alter overall yields.
Just like there’s more than one way to skin a cat, multiple sequences can code for Cysteine-Alanine-Threonine. When ribosome hits the RNA, however, it doesn’t encounter every possible triplet for a given amino acid with equal frequency. The observed tendency for some favored trinucleotides to frequently specify particular amino acids, called codon bias, varies between and within genomes. The rapidly-replicating cellular workhorses of synthetic biology, Escherichia coli and Saccharomyces cerevisiae display particular predilections for common codons, especially in highly expressed genes, likely due to a combination of selection for translational efficiency, genomic GC-content, and mutational biases. The relative contributions of selection and drift to the observed biases remains disputed, although recent results published in PLOS Genetics might help clarify the debate. Regardless of the underlying mechanism, however, synthetic biologists can leverage codon usage bias to their advantage as an additional layer of regulation in synthetic circuits.
Petabytes of publically available sequence data combined with easy-to-implement genome engineering allow for increasingly accurate cost/benefit analyses of biased codon usage, which can inform synthetic strategies. Early experiments in E. coli measured blunt readouts of physiological stress in large batch cultures expressing high levels of heterologous proteins riddled with clusters of disfavored AGA (Arginine) codons. It goes without saying that multiple facets of cell physiology, each with pleitropic effects, ultimately determine protein yields for a given set of conditions—one reason that induction strategies that work wonders for small culture volumes sometimes fail spectacularly at industrial scales.
Since the Codon Adaptation Index (CAI) was first described in 1987, a number of freely available sequence-based computational tools have been developed for synthetic biologists to predict protein expression levels, identify rare codons, and optimize synthetic gene design.
So-called “silent” changes in the DNA sequence sometimes dramatically alter phenotypes. Recoding viral genomes can wreak havoc with the temporal regulation of viral gene expression necessary to complete the infection cycle. This strategy has been implemented to generate attenuated forms of RNA viruses such as Dengue and Polio.
Rare codons can also prevent problems. Oddball trinucleotides tend to cluster at the beginning of ORFs in highly expressed genes. The “rare codon ramp” counterintuitively increases yield by promoting efficient translation initiation independent of effects on ribosome processivity. Rare codons in the 5′ end of the message tend to decrease potential base-pairing interactions that could otherwise fold into loops or hairpins, especially in organisms with high genomic GC-content. Preventing excessive RNA secondary structure facilitates access to the ribosome binding site for the translation machinery.
Rare codons in the middle of ORFs cause ribosomes to pause due to the relatively low abundance of cognate tRNAs. Inspired by this phenomenon, a strategically placed rare codon allowed for the development of an inducible expression system in E. coli controlled at both the transcriptional and translational levels. An N-terminal rare-codon containing tag prevented reporter protein production, even though the RNA was expressed at high levels. Overexpression of the cognate aminoacyl tRNA synthase led to robust protein synthesis. The molecular device could even control fatty acid metabolism genes at once, offering an additional level of control for metabolic engineering.
Although the occasionally disruptive effects of rare codons on protein expression has long been recognized, controversy remains whether decreased translational speed or accuracy underlies the observed deficiencies in yield. Ribosomes paused at rare codons may incorporate incorrect residues into the nascent polypetide chain. Both the interruption or the error could decrease protein stability, either at the overall structural level or by perturbing co-translational folding.
In order to precisely parse out the fitness defects associated with rare codon usage in a highly expressed gene in Salmonella typhimirium, Brandis and Hughes recently undertook an odyssey of systematic codon switching, gene synthesis, and strain construction. They designed 18 fully-functional alleles of the tufA gene (tufA codes for EF-Tu, which comprises fully 9% of the proteome in exponentially growing S. typhimirium cells). Each variant harbored between 12 and 25 synonymous substitutions that replaced combinations of the optimal codons for leucine, valine, proline, and arginine with one of 10 different non-preferred sequences. Some alleles harbored clusters of strategically targeted substitutions contained within either the beginning or end of the open reading frames; others had mutations dispersed throughout the gene. None of the constructs tinkered with the first 40 residues of tufA, in recognition of the observed enrichment for rare codons in the N-terminal regions of highly expressed genes.
To measure minute differences in cellular fitness associated with non-preffered codon usage they replaced endogenous tuf loci with the alternate alleles, reciprocally marked strains harboring wild type and mutant versions of the gene with chromosomal YFP and BFP cassettes, then calculated fine-scale competitive indices, using flow cytometry to monitor the abundance of each strain in co-culture. Although individual mileage varied, they measured an average selective disadvantage of 2×10-4 per rare codon per generation—a handicap of roughly 6×10-4 doublings per hour.
Importantly, the positions of the non-synonymous changes appeared not to influence the observed selective disadvantages, suggesting that the effects arose from slower translation rather than misincorporation. The fitness defects associated with single instance of non-preferred codon usage might seem tiny, small differences snowball into significant effects over billions of years of evolutionary time, or in a thousand-liter batch fermenter.
Coding poTAYto instead of poTAHto could be the perfect strategy for protein expression. Increasingly accurate quantitative measurements of the processes that keep life pirouetting along can inform rational design to optimize synthetic circuits. Let’s not call the whole codon-usage bias thing off!
References and further reading:
Daniel E, Onwukwe GU, Wierenga RK, Quaggin SE, Vainio SJ, Krause M. ATGme: Open-source web application for rare codon identification and custom DNA sequence optimization. BMC Bioinformatics. 2015;16(1):303.