A New Language for Life

Share this post

All the beautiful, remarkable complexities of life that we see around us are, believe it or not, encoded at the most basic level by an alphabet just 5 letters long. The DNA code, which is shared by all life on Earth, is formed from molecules known as nucleotides which come in just four forms: Adenine, Cytosine, Guanine and Thymine. RNA, the single-stranded cousin of DNA which is important in translating DNA into protein, adds a fifth letter – Uracil. It is truly one of the most impressive feats of evolution, that such a simple alphabet can generate such diversity and adaptation. However, recently scientists from The Scripps Research Institute, California have engineered a life form with an expanded vocabulary.


In nature, double-stranded DNA helices are formed from pairings of nucleotides bound together by hydrogen bonding. These pairings are quite precise, A pairs with T and C with G. However, not long after the structure of DNA was revealed, scientists were considering the possibility of alternative pairings, and in 1989, successful base pairing was achieved between new structural conformations of the normal nucleotides guanine and cytosine. A few years later, scientists discovered that the hydrogen bonding between bases was not necessarily required, and that pairs could be formed simply through conformational compatibility and hydrophobic interactions. In 2012, researchers successfully developed two new bases: d5SICS and dNaM (yes, I agree they could have given them catchier names). The team were able to get them to replicate in vitro with 99% fidelity. And this year they’ve got it to do all that, inside a living organism (a bacterium; E. coli, to be precise). Achieving this remarkable feat had two hurdles:

1. Get the artificial nucleotides into the bacterium

Having engineered the artificial nucleotides, the authors set about engineering a bacterium that would accept them. They modified the genetic code of E. coli to carry the gene for an algal nucleotide triphosphate transporter (NTT), which would import the artificial nucleotides into the bacterium. Placing the artificial base pairs within a plasmid (a loop of DNA commonly found in bacteria), the authors then used the NTT to transport the plasmid, containing 3 base pair types (A-T, C-G and d5SICS-dNaM) into the modified E. coli.

2. Get the bacterium to replicate them

Next they needed to ensure successful replication of the plasmid within its new bacterial home. Replication of DNA is performed by a class of proteins called polymerases, and the authors had one specific polymerase in mind for the job. DNA polymerase I naturally works to proofread and fill in gaps in the DNA code and has been used to replicate the d5SICS-dNaM base pair before, under laboratory conditions. So the authors placed their artificial nucleotides within a region of the plasmid they expected to be dealt with by polymerase I, in the hope that the artificial base pair would be maintained within the plasmid and replicated within the bacterium. And it worked – using a variety of sophisticated methods, the authors showed that the artificial base pair was still in their plasmid and replicating successfully several days later, achieving replication fidelities above 99%.

The Future of Artificial Nucleotides

More work is needed to ensure the long-term retention of unnatural base pairs in living organisms. However, this research is providing a foundation for a whole array of possibilities in the future which will change the way we think about medicine, biotechnology and even life itself. The new base pairs don’t currently doanything, in the sense that they, unlike their natural counterparts, don’t translate into a protein product. But many important biological processes are governed directly by the shape of nucleotide complexes (e.g. RNA), so even in their current form, artificial nucleotides could be used to augment existing RNA elements and RNA-protein complexes. They could be inserted into regulatory regions to construct new regulatory architectures. The ultimate goal, however, must surely be to add functionality to new base pairs by making the final link, from DNA to protein. This would involve designing novel codons (three letter ‘words’ that base-pairs are read in) which link to artificial transferRNA molecules, culminating in the incorporation of novel or non-standard amino acids and the generation of entirely new protein products. This could open up the genetic code for new medical applications and the potential for an explosion of new genomic evolution.

All this leaves one question. If it is comparatively easy to generate new base pairs, then why did nature stop at just two? It is entirely possible that as this new technology develops we will discover the answer to this question – for instance, perhaps organisms with expanded genetic alphabets tend to be less fit because of reductions in replication efficiency or more frequent translation errors? Only time will tell…

Want to Know More?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.