B M B 400, Part Three

Gene Expression and Protein Synthesis

Section IV = Chapter 13

GENETIC CODE

 

     Overview for Genetic Code and Translation:

 

    Once transcription and processing of rRNAs, tRNAs and snRNAs are completed, the RNAs are ready to be used in the cell ‑ assembled into ribosomes or snRNPs and used in splicing and protein synthesis.  But the mature mRNA is not yet functional to the cell.  It must be translated into the encoded protein.  The rules for translating from the "language" of nucleic acids to that of proteins is the genetic code. Experiments testing the effects of frameshift mutations showed that the deletion or addition of 1 or 2 nucleotides caused a loss of function, whereas deletion or addition of 3 nucleotides allowed retention of considerable function.  This demonstrated that the coding unit is 3 nucleotides.  The nucleotide triplet that encodes an amino acid is called a codon.  Each group of three nucleotides encodes one amino acid.  Since there are 64 combinations of 4 nucleotides taken three at a time and only 20 amino acids, the code is degenerate (more than one codon per amino acid, in most cases).  The adaptor molecule for translation is tRNA.  A charged tRNA has an amino acid at one end, and at the other end it has an anticodon for matching a codon in the mRNA; ie. it "speaks the language" of nucleic acids at one end and the "language" of proteins at the other end.  The machinery for synthesizing proteins under the direction of template mRNA is the ribosome.

 

 

Figure 3.4.1. tRNAs serve as an adaptor for translating from nucleic acid to protein

              

 

 

 

 

 

 

 

A.        Size of a codon: 3 nucleotides

 

1.  Three is the minimum number of nucleotides per codon needed to encode 20 amino acids.

 

a.  20 amino acids are encoded by combinations of 4 nucleotides

 

b.  If a codon were two nucleotides, the set of all combinations could encode only

 

4x4 = 16 amino acids.

 

c.  With three nucleotides, the set of all combinations can encode

 

4x4x4 = 64 amino acids

(i.e. 64 different combinations of four nucleotides taken three at a time).

 

2.  Results of combinations of frameshift mutations show that the code is in triplets.

 

Length‑altering mutations that add or delete one or two nucleotides have severe defective phenotype (they change the reading frame, so the entire amino acid sequence after the mutation is altered.).  But those that add or delete three nucleotides have little or no effect.  In the latter case, the reading frame is maintained, with an insertion or deletion of an amino acid at one site.  Combinations of three different single nucleotide deletions (or insertions), each of which has a loss-of-function phenotype individually, can restore substantial function to a gene.  The wild-type reading frame is restored after the 3rd deletion (or insertion).

 

 

 

B.  Experiments to decipher the code

 

 

1.  Several different cell‑free systems have been developed that catalyze protein synthesis.  This ability to carry out translation in vitro was one of the technical advances needed to allow investigators to determine the genetic code.

 

a.  Mammalian (rabbit) reticulocytes:  ribosomes actively making lots of globin.

 

b.  Wheat germ extracts

 

c.  Bacterial extracts

 

 

 

 

 

 

 

 

 

2.  The ability to synthesize random polynucleotides was another key development to allow the experiments to decipher the code.

 

S. Ochoa isolated the enzyme polynucleotide phosphorylase, and showed that it was capable of linking nucleoside diphosphates (NDPs) into polymers of NMPs (RNA) in a reversible reaction.

 

                         nNDP n + nPi

 

The physiological function of polynucleotide phosphorylase is to catalyze the reverse reaction, which is used in RNA degradation. However, in a cell-free system, the forward reaction is very useful for making random RNA polymers.

 

 

3.  Homopolymers program synthesis of specfic homo‑polypeptides

                  (Nirenberg and Matthei, 1961).

 

a.   If you provide only UDP as a substrate for polynucleotide phosphorylase, the product will be a homopolymer poly(U).

 

b.   Addition of poly(U) to an in vitro translation system (e.g. E. coli lysates), results in a newly synthesized polypeptide which is a polymer of polyphenylalanine.

 

c.  Thus UUU encodes Phe.

 

d.  Likewise, poly(A) programmed synthesis of poly‑Lys;  AAA encodes Lys.

                        Poly(C) programmed synthesis of poly‑Pro;  CCC encodes Pro.

                        Poly(G) programmed synthesis of poly‑Gly;  GGG encodes Gly.

 

 

 

4.  Use of mixed co‑polymers

 

a.   If two NDPs are mixed in a known ratio, polynucleotide phosphorylase will make a mixed co‑polymer in which nucleotide is incorporated at a frequency proportional to its presence in the original mixture.

 

b.   For example, consider a 5:1 mixture of A:C.  The enzyme will use ADP 5/6 of the time, and CDP 1/6 of the time.  An example of a possible product is:

 

      AACAAAAACAACAAAAAAAACAAAAAACAAAC...

 

 

     Table 3.4.1.  Frequency of triplets in a poly(AC) (5:1) random copolymer

 

Composition

Number

Probability

Relative frequency

3 A

1

0.578

1.0

2 A, 1 C

3

3 x 0.116

3 x 0.20

1 A, 2 C

3

3 x 0.023

3 x 0.04

3 C

1

0.005

0.01

 

 

c.   So the frequency that AAA will occur in the co‑polymer is

(5/6)(5/6)(5/6) = 0.578.

 

This will be the most frequently occurring codon, and can be normalized to 1.0 (0.578/0.578 = 1.0)

 

d.  The frequency that a codon with 2 A's and 1 C will occur is

(5/6)(5/6)(1/6) = 0.116.

 

There are three ways to have 2 A's and 1 C, i.e. AAC, ACA and CAA.

So the frequency of occurrence of all the A2C codons is 3 x 0.116.

Normalizing to AAA having a relative frequency of 1.0, the frequency of A2C codons is 3 x (0.116/0.578) = 3 x 0.2.

 

e.   Similar logic shows that the expected frequency of AC2 codons is 3 x 0.04, and the expected fequency of CCC is 0.01.

 

 

Table 3.4.2.  Amino acid incorporation with poly(AC) (5:1) as a template

 

Radioactive

Precipitable cpm

 

Observed

Theoretical

amino acid

- template

+ template

incorporation

incorporation

 

Lysine

60

4615

100.0

100

 

Threonine

44

1250

26.5

24

 

Asparagine

47

1146

24.2

20

 

Glutamine

39

1117

23.7

20

 

Proline

14

342

7.2

4.8

 

Histidine

282

576

6.5

4

 

 

These data are from Speyer et al. (1963) Cold Spring Harbor Symposium in Quantitative Biology, 28:559.  The theoretical incorporation is the expected value given the genetic code as it was subsequently determined.

 

 

f.    When this mixture of mixed copolymers is used to program in vitro translation, Lys is incorporated most frequently, which can be expressed as 100.  This confirms that AAA encodes Lys.

 

g.   Relative to Lys incorporation as 100, Thr, Asn, and Gln are incorporated with values of 24 to 26, very close to the expectation for amino acids encoded by one of the A2C codons.  However, these data do not show which of the A2C codons encodes each specific amino acid.  We now know that ACA encodes Thr, AAC encodes Asn, and CAA encodes Gln.

 

h.   Pro and His are incorporated with values of 6 and 7, which is close to the expected 4 for amino acids encoded by AC2 codons.  E.g. CCA encodes Pro, CAC encodes His.  ACC encodes Thr, but this incorporation is overshadowed by the Ò26.5Ó units of incorporation at ACA.  Or, more accurately, Ò26.5Ó @ 20 (ACA) + 4 (ACC) for Thr.

 

 

 

5.   Defined trinucleotide codons stimulate binding of aminoacyl‑tRNAs to ribosomes

 

a.   At high concentrations of Mg cations, the normal initation mechanism, requiring f‑Met‑tRNAf, can be overriden, and defined trinucleotides can be used to direct binding of particular, labeled aminoacyl‑tRNAs to ribosomes.

 

b.   E.g. If ribosomes are mixed with UUU and radiolabeled Phe‑tRNAphe, under these conditions, a ternary complex will be formed that will stick to nitrocellulose ("Millipore assay" named after the manufacturer of the nitrocellulose).

 

c.  One can then test all possible combinations of triplet nucleotides.

 

Fig. 3.4.2.

 

 

Data from Nirenberg and Leder (1964) Science 145:1399.

 

 

6.  Repeating sequence synthetic polynucleotides (Khorana)

 

a.   Alternating copolymers: e.g. (UC)n programs the incorporation of Ser and Leu.

 

      So UCU and CUC encode Ser and Leu, but cannot tell which is which.  But in combination with other data, e.g. the random mixed copolymers in section 4 above, one can make some definitive determinations.  Such subsequent work showed that UCU encodes Ser and CUC encodes Leu.

 

            b.  poly(AUG) programs incorporation of poly‑Met and poly‑Asp at high Mg concentrations.  AUG encodes Met, UGA is a stop, so GUA must encode Asp.

 

C.        The genetic code

 

1.  By compiling observations from experiments such as those outlined in the previous section, the coding capacity of each group of 3 nucleotides was determined.  This is referred to as the genetic code.  It is summarized in Table 3.4.4.  This tells us how the cell  translates from the "language" of nucleic acids (polymers of nucleotides) to that of proteins (polymers of amino acids). 

 

Knowledege of the genetic code allows one to predict the amino acid sequence of any sequenced gene.  The complete genome sequences of several organisms have revealed genes coding for many previously unknown proteins.  A major current task is trying to assign activities and functions to these newly discovered proteins.

 

 

 

 

 

Table 3.4.4.   The Genetic Code

 

                                                Position in Codon                                                             .

1st

                                                       2nd                                                   .

 

3rd

 

 

          U        .

 

           C       .

 

          A         .

 

          G        .

 

 

U

UUU

Phe

 

UCU

Ser

 

UAU

Tyr

 

UGU

Cys

 

U

 

UUC

Phe

 

UCC

Ser

 

UAC

Tyr

 

UGC

Cys

 

C

 

UUA

Leu

 

UCA

Ser

 

UAA

Term

 

UGA

Term

 

A

 

UUG

Leu

 

UCG

Ser

 

UAG

Term

 

UGG

Trp

 

G

 

 

 

 

 

 

 

 

 

 

 

 

 

 

C

CUU

Leu

 

CCU

Pro

 

CAU

His

 

CGU

Arg

 

U

 

CUC

Leu

 

CCC

Pro

 

CAC

His

 

CGC

Arg

 

C

 

CUA

Leu

 

CCA

Pro

 

CAA

Gln

 

CGA

Arg

 

A

 

CUG

Leu

 

CCG

Pro

 

CAG

Gln

 

CGG

Arg

 

G

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A

AUU

Ile

 

ACU

Thr

 

AAU

Asn

 

AGU

Ser

 

U

 

AUC

Ile

 

ACC

Thr

 

AAC

Asn

 

AGC

Ser

 

C

 

AUA

Ile

 

ACA

Thr

 

AAA

Lys

 

AGA

Arg

 

A

 

AUG*

Met

 

ACG

Thr

 

AAG

Lys

 

AGG

Arg

 

G

 

 

 

 

 

 

 

 

 

 

 

 

 

 

G

GUU

Val

 

GCU

Ala

 

GAU

Asp

 

GGU

Gly

 

U

 

GUC

Val

 

GCC

Ala

 

GAC

Asp

 

GGC

Gly

 

C

 

GUA

Val

 

GCA

Ala

 

GAA

Glu

 

GGA

Gly

 

A

 

GUG*

Val

 

GCG

Ala

 

GAG

Glu

 

GGG

Gly

 

G

 

 

 

 

 

 

 

 

 

 

 

 

 

 

* Sometimes used as initiator codons.

 

 

 

 

 

2.  Of the total of 64 codons, 61 encode amino acids and 3 specify termination of translation.

 

 

3.  Degeneracy

 

a.  The degeneracy of the genetic code refers to the fact that most amino acids are specified by more than one codon.  The exceptions are methionine (AUG) and tryptophan (UGG).

 

b.  The degeneracy is found primarily the third position.  Consequently, single nucleotide substitutions at the third position may not lead to a change in the amino acid encoded.  These are called silent or synonymous nucleotide substitutions.  They do not alter the encoded protein.  This is discussed in more detail below.

 

c.    The pattern of degeneracy allows one to organize the codons into "families" and "pairs".  In 9 groups of codons, the nucleotides at the first two positions are sufficient to specify a unique amino acid, and any nucleotide (abbreviated N) at the third position encodes that same amino acid.  These comprise 9 codon "families".  An example is ACN encoding threonine. 

 

            There are 13 codon "pairs", in which the nucleotides at the first two positions are sufficient to specify two amino acids.  A purine (R) nucleotide at the third position specifies one amino acid, whereas a pyrimidine (Y) nucleotide at the third position specifies the other amino acid. 

 

            These examples add to more than 20 (the number of amino acids) because leucine (encoded by UUR and CUN), serine (encoded by UCN and AGY) and arginine (encoded by CGN and AGR) are encoded by both a codon family and a codon pair.  The UAR codons specify