Sequencing Proteins by Chemical Techniques
Early Studies. Life processes require the accurate translation of genetic information, which is basically linear in nature, into proteins, which generally only function correctly when they have folded to their proper conformation.
Thus, proteins have both a linear component (amino acid sequence; referred to in the earlier literature as the primary structure) that is dictated by genomic DNA (via RNA copies) and a three-dimensional structure that is dictated by the sequence (although often helped by auxiliary proteins and covalent modifications, usually termed posttranslational modifications or PTMs) (Anfinsen, 1993).
However, as described below, the elucidation of protein sequences has evolved into a relatively simple task during the last 75 years, it is still not possible to predict the exact structure that any given unknown sequence will adopt, although considerable progress has been made in achieving this goal.
The realization, some 50 years ago, that proteins with recognizably related sequences also had related structures spurred the development of molecular evolution and, as the numbers of protein structures from both X-ray crystallography and two-dimensional NMR increased, so did the number of predicted structures from model building.
Today, with the availability of vast amounts of predicted protein sequence data from an ever widening collection of organisms derived from high-throughput (or next Gen) nucleic acid sequencing, the challenges of understanding protein structure/ function properties tend to be ones of identifying and understanding PTMs, particularly transient ones, and the dynamic nature of the multiplicity of protein-protein interactions that characterize cellular processes.
The elucidation of peptide and protein sequences began in earnest about the time of the Second World War. At that juncture, the covalent structure of peptides (the Hofmeister-Fischer hypothesis) was well accepted (Rosenfeld, 2012) but there was some uncertainty about whether they were composed of random or ordered sequences of amino acids. It was also uncertain whether they had 'open' N- and C-termini or occurred as closed loops that lacked free ends.
These issues were definitively resolved by Sanger and his colleagues with the determination of the sequence of insulin (Sanger, 1964), which in its mature state is composed of two distinct polypeptide chains covalently linked by two interchain disulfide bonds. These studies, which required some 12 years to complete, depended on the earlier work of Martin and Synge (1941), who introduced chromatography (initially on paper) as a means of separating complex mixtures and the development of a method for tagging free amino groups with fluorodinitrobenzene (now known as Sanger's reagent).
Using strong and weak (partial) acid hydrolysis and eventually proteolytic enzymes (pepsin, chymotrypsin, and trypsin) and expanding their fractionation techniques to include ionophoresis and ion-exchange chromatography, they completed the sequence determination and elucidated the pairing of the three disulfide bonds (there was one intrachain linkage). They also introduced the strategy of deducing the full sequence from overlapping peptides that became the basis for protein sequencing for the next 25 years (Schroeder, 1968; Blackburn, 1970).
Date added: 2024-06-13; views: 119;