Development of better, cheaper, and more clinically relevant genomic analysis continues apace, but what about the proteome? If anything, the proteome—the full set of expressed proteins—would tell us more about wellness and ill health than the genome. True, genes constitute the foundation of life’s chemical hierarchy, but proteins (to mix metaphors) are where the rubber meets the road. Besides catalyzing the chemical reactions that sustain life, proteins are directly engaged in critical functions such as growth, differentiation, and repair; defense against pathogens; cellular housekeeping; and myriad structural duties.

So how hard could compiling a proteome be? After all, genome sequencing has become almost routine, and just 1.5% or so of the genome codes for proteins. Well, this is where things get tricky. While there are only about 25–25,000 human genes, scientists have already identified over 100,000 human proteins, and many, many more proteins no doubt remain unidentified. What’s more, the human proteome has millions of protein variants due to alternative RNA splicing and post-translational modification. And even that’s not all: It also appears that many so-called noncoding RNAs can actually give rise to physiologically relevant micropeptides.

If all that doesn’t sound daunting enough, consider this: Aberrant proteins that are indicative of disease or disease propensity are often present in extremely minute quantities. How could such proteins serve as biomarkers if they are so few and far between that they remain, essentially, invisible? Alas, no technology exists that could do for these proteins what the polymerase chain reaction (PCR) does for targeted regions of DNA. That is, no means exist to boost the concentrations of selected proteins and thereby enhance signal strengths.

Taking the measure of the proteome remains a huge challenge, but many researchers insist on trying. For example, a team of researchers at Arizona State University’s Biodesign Institute led by Stuart Lindsay, Ph.D., are refining a technique for single-molecule protein sequencing. The technique, which adapts technology that the team had used to sequence DNA, is known as recognition tunneling. It involves threading a peptide through a nanopore, an extremely tiny eyelet, which separates two electrodes that are coated with a layer of recognition molecules. When the protein is held between the electrodes, changes in the electron tunneling current between the electrodes is measured. Then, the current “signature” is analyzed to identify the amino acid. (Conceivably, this technique could be applied to peptides, amino acid by amino acid.)

The procedure is described in detail in a paper that appeared April 6 in Nature Nanotechnology, in a paper entitled “Single-molecule spectroscopy of amino acids and peptides by recognition tunneling.” According to this paper, signal analysis turned out to be fairly complex, requiring the services of a machine learning algorithm. This algorithm, called the Support Vector Machine, was used to train a computer to make sense of the signals that were emitted when the amino acids formed bonds in the tunnel junction and current flowed between the electrodes.

The algorithm—the same one used by the IBM computer Watson to defeat a human opponent in Jeopardy—helped the computer learn to discriminate between the different signals that could be emitted by the same molecule. For example, many molecules are able to bind with the tunnel junction in different ways. Also, as indicated in the paper, recognition tunneling let the researchers “identify D and L enantiomers, a methylated amino acid, isobaric isomers, and short peptides.”

The results of their work, reported the researchers, “suggest that direct electronic sequencing of single proteins could be possible by sequentially measuring the products of processive exopeptidase digestion, or by using a molecular motor to pull proteins through a tunnel junction integrated with a nanopore.”

“The ability of recognition tunneling to pinpoint abnormalities on a single molecule basis,” asserted Dr. Lindsay, “could be a complete game changer in proteomics.” Dr. Lindsay adds that the kind of work accomplished by his team—exploring innovative strategies for handling single molecules coupled with startling advances in computing power—may open up horizons that were inconceivable only a short time ago.

By showing that the kinds of tools that made the $1,000 genome feasible are applicable to proteome profiling, Dr. Lindsay’s team may even hearten those so bold to anticipate a $1,000 proteome. “Why not?” Dr. Lindsay asks. “People think it’s crazy, but the technical tools are there. And what will work for DNA sequencing will work for protein sequencing.”

While the tunneling measurements have until now been made using a complex laboratory instrument known as a scanning tunneling microscope, Dr. Lindsay and his colleagues are currently working on a solid-state device that may be capable of fast, cost-effective, and clinically applicable recognition tunneling of amino acids and other analytes. Eventual application of such solid-state devices in massively parallel systems could make clinical proteomics a practical reality.

Also of Interest