Reverse Translation Enables DNA-Based Protein Sequencing

For decades, reading proteins has remained an analog challenge in a digital world.

Researchers at Stanford University have developed a “reverse translation” method that converts protein sequences into DNA, allowing standard DNA sequencers to identify individual molecules with extreme sensitivity.

The proteomics bottleneck

Proteins act as the functional machinery of life, carrying out tasks that their DNA instructions alone cannot fully explain.

While DNA provides the blueprint, proteins perform the work, from cell signaling to immune responses and other essential functions. Understanding how these molecules work is the core of proteomics, yet reading protein sequences is much harder than reading DNA.

A massive technical gap exists between genomics and proteomics. DNA sequencing is fast, cheap, and scalable. In contrast, protein sequencing remains slow and resource-heavy.

“In nature, proteins are made from DNA. Over the past two decades, our society has created amazing technologies to sequence a lot of DNA really quickly and inexpensively,” said corresponding author Dr. Hyongsok Tom Soh, a professor of bioengineering at Stanford University. “But unfortunately, we haven’t really made similar progress for sequencing proteins.”

Current tools, such as mass spectrometry (MS), usually require billions of molecules to work, meaning researchers often miss rare, low-abundance proteins that could be key to understanding disease. Emerging single-molecule methods using nanopores or fluorescence have struggled with high error rates.

Researchers need a way to see single molecules with the same speed and accuracy that exists for DNA. The new study aimed to achieve this by turning protein data into a format that DNA sequencers can read.

Reverse translating peptides into DNA for high-throughput proteomics

The Stanford team developed a process that converts a peptide sequence into a DNA library.

“We created a technology that can convert protein sequences back into DNA sequences. It’s kind of like running the natural process—in reverse—so that we can leverage powerful DNA sequencing technology that is already available,” explained Soh.

The process involves three main steps:

Peptides are fixed in place and tagged with a unique DNA barcode.
A chemical process called Edman degradation removes amino acids one by one from the end of the chain.
The released molecules are identified by antibodies; when an antibody binds to an amino acid, it triggers a DNA ligation event.

The team utilized a proximity extension assay to ensure that DNA ligation only occurs when the correct antibody binds to its target amino acid. This creates a “DNA reporter” that records the identity of the amino acid, its position in the chain, and the peptide from which it originated.

“This is the breakthrough,” said Soh. “We can sequence individual proteins at a single molecule level, requiring very little sample, which can get us to individual cells.”

The findings show that the team can sequence millions of peptide molecules in a single run. Unlike traditional MS, this method can see almost every molecule in a sample.

“With MS, you’re shooting 1 billion to 10 billion protein molecules and see, typically, a million molecules out of it. With our method, you can potentially see 1,000 times that amount,” said first author Dr. Liwei Zheng, a research engineer at Stanford University.

This high resolution allows researchers to access full peptide sequence coverage and even detect chemical modifications on the proteins.

Expanding the reach and accessibility of next-generation proteomics

Many labs already own DNA sequencers, and this method would allow those labs to perform advanced protein analysis without buying expensive, specialized MS equipment. It could also lead to single-cell proteomics, where scientists can see how proteins differ from one cell to the next within the same tissue.

However, there are still challenges to overcome. The system relies on a library of antibodies to recognize each of the 20 amino acids, and the accuracy of the sequencing depends on these antibodies being highly specific. The chemical cycles must also be incredibly efficient to ensure that longer protein chains don’t lose their structure during the process.

Future work should focus on expanding the recognition library to include more types of protein modifications and automating the entire workflow.

The goal is to turn this laboratory process into a simple, commercial tool. Soh envisions a system where researchers can “put in a sample, press a button, and have it go.”

“Once you convert everything to DNA, you can think about all the naturally evolved machinery that can manipulate DNA—lengthening DNA, copying DNA—all these things will become possible for processing protein sequences,” said Zheng.

This could lead to better personalized medicine, such as helping doctors understand why certain cancer treatments work for some patients but not others by looking at the rare proteins in their immune cells.

Reference: Zheng L, Sun Y, Hein LA, Eisenstein M, Soh HT. Single-molecule peptide sequencing through reverse translation of peptides into DNA. Nat Biotechnol. 2026. doi: 10.1038/s41587-026-03061-z

This article is a rework of a press release issued by Stanford University. Material has been edited for length and content.

Source link

The proteomics bottleneck

Reverse translating peptides into DNA for high-throughput proteomics

Expanding the reach and accessibility of next-generation proteomics

Leave a Reply Cancel reply