This chapter was published in Disorders of Hemoglobin:  Genetics, Pathophysiology, Clinical Management in 2001, Cambridge University Press, Cambridge, UK

Editors: Martin H. Steinberg, Bernard G. Forget, Douglas R. Higgs, and Ronald L. Nagel

Copyright: Cambridge University Press and Ross Hardison

 

Chapter 5:  Organization, evolution and regulation of the globin genes

 

Original version completed: June 17, 1998

Updated May 09, 2000

Published in 2001.

The revised version of this book (completed in 2007) leaves out most of the material in this chapter, and thus I am posting it on the internet to keep the material available. Please reference the chapter and book if you use this information.

 

Author: Ross C. Hardison

 

Department of Biochemistry and Molecular Biology, Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, 304 Wartik Laboratory, University Park, PA, 16802 USA

Phone: 814-863-0113

E-mail: rch8@psu.edu


5.1 Introduction

 

            Hemoglobin genes are ancient, dating back perhaps as far as the origins of cellular life.  The familiar class of hemoglobins used for oxygen transport illustrates only one function of hemoglobins.  This chapter will review some of the principle events in the evolution of vertebrate globin gene clusters within the context of their long history.  This evolutionary framework provides some insights into important issues such as the origin and function of the locus control regions, the contrasting chromatin structure of alpha- and beta-like globin gene clusters, and the prospects for targeting delta- or gamma-globin gene expression in therapies for beta-globin gene defects.

 

5.2. Broad distribution of hemoglobins in the biosphere

 

            Hemoglobins similar to human Hb A are found in erythrocytes of all vertebrates (Dickerson and Geis 1983).  Each is a heterotetramer with two subunits related to the alpha-globin subfamily (referred to here as alpha-like globins) and two subunits related to the beta-globin subfamily (beta-like globins).  The globin polypeptides bind heme, which in turn allows the hemoglobin in erythrocytes to bind oxygen reversibly and transport it from the lungs to respiring tissues.  In all species studied, different alpha-like and beta-like globins are synthesized at progressive stages of development to produce hemoglobins characteristic of primitive (embryonic) and definitive (fetal and adult) erythroid cells (Bunn and Forget 1986).  However, the vertebrate hemoglobins comprise only a small part of the hemoglobin family (Fig. 1).  A close relative, the monomeric myoglobin, stores oxygen in tissues such as muscle (Wittenberg and Wittenberg 1987).  As illustrated by the summary phylogenetic tree in Fig. 1, the amino acid sequences of the alpha- and beta-globins and myoglobin are related to each other, indicating a common ancestor in early vertebrates approximately 500 million years ago (Goodman et al. 1987).   The 3-dimensional structure of myoglobin was one of the first solved, revealing a series of alpha-helices that form the hemepsilon-binding pocket.  This structure, the globin fold, is seen in myoglobin, alpha-globin, and beta-globin;  it is characteristic of all members of the hemoglobin family of proteins (Dickerson and Geis 1983). 

 

 

Fig. 1.  Broad distribution and diverse functions of hemoglobins.  The phylogenetic tree on the left is a summary of trees generated by aligning amino acid sequences of hemoglobins from species representative of each taxa (using CLUSTAL W) and computing trees based on parsimony (PAUP) and analysis of distance measures by Neighbor joining and UPGMA.  The latter two used the MEGA suite of programs (Kumar et al. 1993).  Trees of the same topology were generated by all three methods.  The summary tree shows that topology but is not drawn to scale. This and subsequent trees indicate the relative time of the divergences;  nodes more to the left indicate a relatively earlier time.  Functions and induction agents are also listed.  More details and references have been reviewed (Hardison 1998).

 

            Hemoglobins are also used for oxygen transport in invertebrates (Riggs 1991; Dixon et al. 1992; Sherman et al. 1992).  Many non-vertebrates have gigantic extracellular hemoglobins, in some species formed by as many as 200 monodomain subunits in a multimeric protein, and in others by covalent linkage into very long polypeptide chains (reviewed in Terwilliger 1998).  The invertebrate hemoglobins are homologous to the vertebrate hemoglobins, and they form a distinct branch in a phylogenetic tree of hemoglobins (Fig. 1).  Hemoglobins are present in plants, both the leghemoglobins with specialized functions in root nodules (Brisson and Verma 1982; Appleby 1984) as well as the broadly distributed, nonsymbiotic hemoglobins (Andersson et al. 1996).  Interestingly, the genes for plant and invertebrate hemoglobins have a similar structure.  Both groups of genes have three introns separating four exons, with at least two introns in identical positions (Fig. 2).  The similarities in gene structure and the amino acid sequences of the encoded proteins strongly support the hypothesis of a common ancestor to both groups of hemoglobins, showing that the evolutionary history of hemoglobin genes predates the divergence of plants and animals, roughly 1.3 billion years ago (Feng et al. 1997).  It is likely the middle intron was lost prior to the divergence of the vertebrate globin genes, all of which have only two introns.

 

 

Fig. 2.  Intron/exon structure during evolution of hemoglobin genes.

The structures of illustrative contemporary hemoglobin genes are shown on the right, with exons denoted by dark boxes and introns by white boxes.  The position in the hemoglobin a-helical structure of the amino acid encoded at the site of interruption is indicated over the intron, and the loss of the central intron in the ancestor to vertebrates is marked by a vertical arrow.  The evolutionary pathway is indicated by the other arrows.  This "tree" is a gene tree, and grouping of of a yeast hemoglobin gene with bacterial hemoglobin genes may reflect a horizontal gene transfer.  Estimated times of divergence in millions of years (Myr) are given at selected nodes.

 

            Given that the hemoglobins in the major groups of multicellular organisms - plants, invertebrates and vertebrate animals - are used for storage and transport of oxygen, one might have expected hemoglobins to be absent from unicellular organisms.  It was thought that simple diffusion was sufficient to provide adequate oxygen inside the cells of unicellular, freepsilon-living organisms.    However, hemoglobins have now been characterized in several species of eubacteria,  the fungus Saccharomyces cerevisiae,  and protists such as the alga Chlamydomonas  and the protozoan Paramecium  (reviewed in Hardison 1996; Hardison 1998)  These hemoglobins from unicellular organisms appear to play roles distinctly different from those of vertebrate hemoglobins.  The familiar functions in oxygen transport and storage require reversible binding of oxygen, and that occurs only when the iron in the heme stays in the reduced (+2) oxidation state, i.e. when it is a ferrous ion.  Biochemical analysis has shown that hemoglobins from Chlamydomonas  (Couture and Guertin 1996), Saccharomyces  (Zhu and Riggs 1992) and the bacterium Alcaligenes  (Cramm et al. 1994) can participate in electron transfer reactions in vitro, with the hemepsilon-bound iron changing cyclically between the +2 and +3 oxidation states.  The latter two hemoglobins are actually two-domain proteins, one binding heme and the other binding flavin cofactors, which usually plays a role in redox reactions. Also, the hemoglobin from Vitreoscilla can serve as a terminal electron acceptor during respiration in vivo (Dikshit et al. 1992).

Recent studies clearly show that the hemoglobins in unicellular organisms have enzymatic functions and are not oxygen-transporting proteins. The flavohemoglobins from the enteric bacteria Escherichia coli and Salmonella typhimurium  (Crawford and Goldberg, 1998; Gardner et al., 1998; Hausladen et al., 1998) and from yeast (Liu et al., 2000) are enzymes protecting these microorganisms from the highly reactive free radical compound, nitric oxide. Each of these flavohemoglobins is a nitric oxide dioxygenase, catalyzing the conversion of nitric oxide to nitrate. Other functions have also been proposed for bacterial hemoglobins. For instance, the hemoglobin from Vitreoscilla can serve as a terminal electron acceptor during respiration in vivo (Dikshit et al., 1992).

            Hemoglobins involved in catalytic conversions of nitric oxide and oxygen are not limited to microorganisms. A hemoglobin found in the perienteric fluid of the parasitic worm Ascaris lumbricoides also catalyzes reactions between oxygen and nitric oxide, producing nitrate (Minning et al., 1999). However, the chemical mechanism is different from that of the microbial flavohemoglobins, and Mining et al. (1999) propose that this hemoglobin functions to remove oxygen from the perienteric fluid via a series of reactions driven by nitric oxide.

In mammals, hemoglobins not only transport oxygen, but they also help regulate nitric oxide levels. Gow et al. (1998) show that at physiological concentrations, nitric oxide will bind to a cysteine in hemoglobin to form S-nitrosohemoglobin. This binding is favored in oxyhemoglobin, and retains the bioactivity of nitric oxide. Nitric oxide can subsequently be released from deoxyhemoglobin (Jia et al., 1996; Stamler et al., 1997). Since nitric oxide is a major regulator of blood pressure, these new findings indicate that hemoglobin is involved in the control of blood pressure in ways that may facilitate efficient delivery of oxygen to tissues. Furthermore, the interplay between binding of oxygen and nitric oxide to hemoglobin and effects on vasodilation and constriction may have therapeutic applications (e.g., Bonaventura et al., 1999; Gladwin et al., 1999; Nagel, 1999).

            The variety of functions now found for hemoglobins raises the issue of whether the microbial proteins are truly homologous to the hemoglobins from plants and animals.  The amino acid sequence comparisons certainly support a common ancestor to all these sequences, as illustrated in the summary phylogenetic tree (Fig. 1).  Despite the low percent identity (e.g. 25%) between the more dissimilar members of the family, different types of phylogenetic analysis generate trees of the same topology.  The threepsilon-dimensional structures strongly support the conclusion that all these hemoglobins share a common ancestor.  The structures of the bacterial hemoglobins from Vitreoscilla  (Tarricone et al. 1997) and Alcaligenes (Ermler et al. 1995) both have the globin fold first characterized in vertebrate myoglobin.  Indeed, hemoglobins may be part of a larger family of hemoproteins.  For instance,  the light harvesting biliprotein, C-phycocyanin, from the cyanobacterium Mastigocladus laminosus has a threepsilon-dimensional structure very similar to that of a globin (Schirmer et al. 1985).  Although this is not a heme binding protein per se, it does bind a linear tetrapyrrole pigment derived from heme. The structural comparisons indicate that genes for at least some other hemoproteins share a common ancestor with hemoglobin genes (Fig. 1).

            These observations all indicate that the gene encoding hemoglobin is truly ancient, i.e. it appears to have been present in the ancestor to eubacteria and eukaryotes, which is the earliest proposed divergence since the origin of cellular organisms.  This divergence has been dated at approximately 3.9 billion years ago (Feng et al. 1997).  At this early time, very little oxygen was present in the earthÕs atmosphere. Hence the primordial function of hemoglobin may have had little to do with molecular oxygen (Hardison, 1998). The enzymatic functions of hemoglobins found in contemporary microorganisms and nematodes, involving nitric oxide metabolism, provide some insight into the early functions of hemoglobins (Durner et al., 1999). Minning et al. (1999) describe a scenario in which hemoglobins present in contemporary bacteria, which catalyze the enzymatic detoxification of nitric oxide, represent an ancestral function. These ancestral hemoglobins may have evolved into enzymes that catalyze the nitric oxidepsilon-mediated consumption of oxygen, as now observed as a ÒdeoxygenaseÓ in Ascaris. This ÒdeoxygenaseÓ may have evolved into contemporary mammalian hemoglobins, with their limited enzymatic function but the ability to bind and transport both oxygen and nitric oxide.

            Because the separation between archaebacteria and eukaryotes appears to have occurred after the divergence of eubacteria from eukaryotes (both of which have hemoglobins), one may anticipate finding homologs to hemoglobins in archae as well.  An automated analysis of genome sequences has included an archaebacterial gene (from Methanococcus jannaschii) in an orthologous group (Tatusov et al. 1997) containing proteins related to hemoglobins (see http://www.ncbi.nlm.nih.gov/COG/).  Further investigation of this and other archaebacterial genes related to hemoglobins, revealed from the whole genome sequencing, should provide even more insights into the origin and range of functions of hemoglobins.

            An issue that has received much attention is the age of the introns, and whether they serve to separate genes into exons that encode distinct protein domains (Gilbert 1978).  The three introns of globin genes in plants and invertebrates, dating back approximately 1.3 billion years, are between the segments of the gene encoding measurable domains of protein structure (Go 1981).  This has lent support to the model that introns are old, and are the remnants of a process that combined exons to generate genes with new structures and functions (Gilbert 1978).  However, the hemoglobin genes from protists have introns in positions unique to many of the species, and those of eubacteria have no introns (Fig. 2).  Attempts to explain this degree of heterogeneity as the result of differential loss of introns require a very large number of introns to be proposed in the ancestral gene.  An alternative explanation is that at least some of the introns in the protist hemoglobin genes arose by insertion of new introns in each lineage, consistent with the "introns late" model (Stoltzfus et al. 1994).  Thus it seems unlikely that all the introns in contemporary hemoglobin genes were present in the ancestral gene (i.e. preceding the divergence of eubacteria, archaebacteria and eukaryotes), and hence the "introns early" hypothesis is not adequate to explain all of the introns.  However, this does not rule out the possibility that some introns, perhaps those still in hemoglobin genes in multicellular organisms, were in the ancestral gene.

 

5.3. Evolution of alpha- and beta-globin gene clusters in vertebrates

 

            Human hemoglobins are encoded at two separate loci, the beta-like globin gene cluster on chromosome 11p15.5 (Deisseroth et al. 1978) and the alpha-like globin gene cluster close to the terminus of chromosome 16p (Deisseroth et al. 1977).  As shown in Fig. 3, the genes in each cluster are in the same transcriptional orientation and are arranged in the order of their expression during development, with the active beta-like globin genes arranged 5'-epsilon (embryonic)-Ggamma (fetal)-Agamma (fetal)-delta (minor adult)-beta (major adult)-3' (Fritsch et al. 1980) and the active alpha-like globin genes arranged 5'-zeta (embryonic)-alpha2 (fetal and adult)-alpha1 (fetal and adult)-3' (Lauer et al. 1980).  [Lower levels of the gamma- and alpha-globins are also produced in embryonic red cells.]  This section reviews some of the key events in the evolution of this arrangement of globin genes.

 

Fig. 3.  Evolutionary pathways for alpha- and beta-globin gene cluters in vertebrates.  Each gene is indicated simply by a Greek letter.  Contemporary gene clusters are on the right (Hardison 1991 and references therein), and the deduced course of evolution to them is shown by a series of arrows.  The ancestor to alpha- and beta-globin genes is indicated as pro-alpha/beta.  LCR = locus control region, HS-40 = the distal major control region of mammalian alpha-globin genes, HSs = DNase hypersensitive sites, En = enhancer.

 

            The effective transport of oxygen between tissues by hemoglobin is accomplished by highly cooperative binding of oxygen when its concentration is high (e.g. in the lungs), followed by cooperative dissociation when its concentration is low (e.g. in respiring tissues in the periphery).  In vertebrates, this cooperativity is accomplished by the interactions between the alpha- and beta-globin subunits of hemoglobin (see chapter 8 by M. Perutz).  Vertebrate hemoglobins are kept at high concentrations inside erythrocytes, specialized cells devoted to the task of oxygen transport.  Thus the divergence of the ancestral globin gene into the alpha-globin and beta-globin genes (Fig. 3), and expression of these genes at a high level only in erythroid cells, were key steps in the evolution of cooperativity in hemoglobin and efficient oxygen transport in vertebrates.  These goals have been accomplished by different mechanisms in other evolutionary lineages.  For instance, the basis for cooperativity in non-vertebrate hemoglobins is quite different, in many cases involving reversible dissociation of hemoglobin subunits upon oxygenation (Riggs 1998). 

            Vertebrate alpha- and beta-globin genes likely arose by the duplication and subsequent divergence of an ancestral globin gene in early vertebrates.  This would have generated a linked set of alpha- and beta-globin genes (Fig. 3), which is the arrangement seen in contemporary globin gene clusters of the teleost zebrafish Danio rerio  (Chan et al. 1997) and in the amphibian Xenopus (Hosbach et al. 1983).  The alpha-globin gene cluster is thought to have separated from the beta-globin gene cluster prior to the divergence of birds and mammals, since these gene clusters are on separate chromosomes in  both groups of animals (Deisseroth et al. 1976; Hughes et al. 1979).  Gene duplication and divergence continued independently in each of these lineages to generate the contemporary gene clusters.  This is illustrated by the avian and mammalian beta-globin gene clusters, which contain multiple genes expressed differentially in development (Fig. 3).  In both species the epsilon- globin gene is expressed in embryos and the beta-globin gene is expressed in adults.  However, the sequence of each chicken beta-like globin gene is equally similar to each human gene (Goodman et al. 1987; Reitman et al. 1993), so that, for example, chicken epsilon-globin is no more similar to human epsilon-globin than to human beta-globin.  This indicates that the gene duplications generating these beta-globin gene clusters occurred after the species diverged. 

            Much is now known about the organization of alpha- and beta-globin gene clusters in contemporary mammals, which can be understood in terms of descent from common gene clusters in an ancestral eutherian mammal.  Fig. 4 shows beta-globin gene clusters in species from five orders of eutherian mammals and a marsupial.  Analysis of DNA sequences showed that a given globin gene in one species is usually more related to a gene in another mammal than to other globin genes in the same species (Hardison 1983; Goodman et al. 1984; Hardies et al. 1984; Hardison 1984; Townes et al. 1984).  This indicated that these genes are orthologous, i.e., they are similar because of descent from the same gene in the last common ancestor to the two species.  The exceptions to this observation could be explained by gene duplications within a single mammalian lineage, e.g. the duplication of gamma-globin genes in the ancestor to simian primates to produce paralogous genes (similar because of duplication of the ancestral gene), such as the Ggamma- and Agamma-globin genes in humans (Shen et al. 1981; Fitch et al. 1991).  Finding orthologs to epsilon-, gamma-, eta-, delta- and beta-globin genes, in that order, in virtually all eutherian mammals suggested that the ancestral eutherian had at least this set of genes (Fig. 4).  This hypothesis was strongly supported by the observation of substantial regions of sequence similarity outside the coding regions of the genes, in the introns and flanking regions (Hardies et al. 1984; Hardison 1984; Margot et al. 1989; Shehee et al. 1989; Hardison and Miller 1993).  As will be discussed more extensively below, some but not all of these matching sequences are strong candidates for regulatory function.  The long regions of matching sequences outside functional regions, and thus not subject to any obvious selection, were key observations in establishing this model for evolution of the mammalian globin gene clusters.  Deletions, conversions and duplications of both single genes and blocks of genes have occurred in each mammalian order to generate the current gene clusters (reviewed in Collins and Weissman 1984; Hardison 1991).

 

Fig. 4.  Pathways to contemporary mammalian beta-globin gene clusters.  Genes are indicated by boxes, and orthologous genes have the same type of fill.  The presumptive presence of an LCR in marsupials is indicated by the grey outline (R. Hope, personal communication).  The stage of expression is indicated as E = embryonic, F = fetal, and A = adult.  References are in the text and in reviews (Collins and Weissman 1984; Hardison 1991). a=alpha, b=beta, z=zeta, e=epsilon, g=gamma, d=delta, r=rho, q=theta

 

            The proposed epsilon-gamma-eta-delta-beta globin gene cluster in the ancestral eutherian mammal was generated by earlier gene duplications.  Estimates based on rates of divergence indicated that the epsilon-, gamma- and eta-globin genes arose from duplications of one ancestral gene, whereas the delta- and beta-globin genes arose by duplication of a different gene, perhaps prior to the divergence of eutherian and metatherian (marsupials and monotreme) mammals (Goodman et al. 1984; Hardison 1984).  This prediction was verified by genomic analysis of marsupials (Koop and Goodman 1988; Cooper et al. 1996), which have two genes in their beta-globin gene clusters, one most related to eutherian epsilon-globin genes and the other most related to beta-globin genes.  Thus the model shown in Fig. 4 is robust, in that it has been supported not only by deductions from analysis of contemporary species, but also by tests of predictions made by the model.

            One important ramification of this model is that orthologous genes have not retained the same time of expression during development in all mammalian orders.  The gamma-globin gene in most mammals is expressed in embryonic erythroid cells, but in simian primates, including humans, it is expressed predominantly in fetal erythroid cells.  Concomitantly with the fetal recruitment of the gamma-globin gene, expression of the beta-globin gene has been delayed in higher primates so that in humans it is expressed primarily in post-natal life.  In other mammals, the beta-globin gene is expressed in both fetal and adult erythroid cells. In goats, the gamma-globin gene has been deleted, and subsequent expansion of the gene cluster by triplication of a four-gene set  (Townes et al. 1984) allowed expression of the resulting paralogous beta-globin genes in fetal life (betaF), adult life (betaA) or under conditions of erythropoietic stress (betaC).  The delta-globin gene is expressed at low levels in adult humans, but is silent in some mammals, and is expressed at high levels in others.  In contrast, the epsilon-globin gene in each mammalian species is expressed only in embryonic erythroid cells derived from the yolk sac.  Within the beta-globin gene clusters of mammals, conservation of stage-specific expression is seen only for this gene, which is located closest to the distal locus control region (LCR, see below and chapter 6 by B. Forget).  Perhaps the embryonic restriction of epsilon-globin gene expression is related to this spatial relationship, with active expression in the embryonic lineage due to its proximity to the LCR, followed by silencing in the fetal and adult (definitive) lineage of erythroid cells (see chapter 14 by G. Stamatoyannopoulos).  Both the proximity to the LCR and the embryonic restriction to expression is conserved in all mammalian epsilon-globin genes examined.

            The beta-globin gene clusters of humans and mice are embedded within a large cluster of olfactory receptor genes, or ORGs (Bulger et al., 1999). This arrangement suggests that the beta-globin genes transposed into a prepsilon-existing array of ORGs. A related ORG is found on the 3Õ side of the chicken beta-globin gene cluster (Bulger et al., 1999), but an erythroid-specific folate receptor gene is located on the 5Õ side, separated from the beta-globin gene cluster by an insulator (Prioleau et al., 1999). The 3Õ breakpoints of at least two deletions causing hereditary persistence of fetal hemoglobin (HPFH) in humans are close to ORGs located 3Õ to the beta-globin genes (Kosteas et al., 1997; Feingold et al., 1999). The ORG close to the HPFH-1 breakpoint is in an open chromatin domain in human erythroid cells (Elder et al., 1990). Enhancer sequences from this ORG are brought in proximity to the gamma-globin genes by the HPFH-1 deletion, and this may play a role in the increased expression of gamma-globin genes in adults carrying this deletion (Feingold and Forget, 1989).

Fig. 5.  Pathways to contemporary mammalian alpha-globin gene clusters.  Genes are indicated by boxes, and orthologous genes have the same type of fill.  The presumptive presence of a homolog to HS-40 (the distal major control region) in mammals besides human and mouse is indicated by the grey outline.  References are in the text and in reviews (Collins and Weissman 1984; Hardison 1991; Hardison and Miller 1993). a=alpha, b=beta, z=zeta, e=epsilon, g=gamma, d=delta, r=rho, q=theta

 

            The evolution of alpha-globin gene clusters in contemporary mammals is not as well understood, in part because less information is available on gene organization and sequence in non-human mammals, and in part because the rate of sequence change in this gene cluster appears to be higher than in the beta-globin gene cluster (Hardison et al. 1991).  Fig. 5 summarizes the arrangement of alpha-like globin gene clusters in representatives of 5 orders of eutherian mammals.  Orthologous relationships have been assigned primarily on the basis of DNA sequence matches outside the genes (Hardison and Gelinas 1986; Sawada and Schmid 1986; Wernke and Lingrel 1986; Flint et al. 1988), even though such matches are considerably more limited than in the mammalian beta-globin gene clusters.  Since a variant of the arrangement 5'-zeta-zeta-alpha-alpha-theta-3' is found in all contemporary mammals examined, it is likely that these genes were present in this order in the gene cluster of the ancestral eutherian mammal.  The timing of expression is well-conserved among these mammals.  The zeta-globin genes are expressed only in embryonic erythroid cells, whereas the alpha-globin genes are expressed in all erythroid cells, albeit at lower levels at the embryonic stage (Rohrbaugh and Hardison 1983; Leder et al. 1985; Peschle et al. 1985).  The theta-globin genes are still not well understood.  The human theta-globin gene is transcribed at low levels but does not encode any known polypeptide found in human hemoglobins (Hsu et al. 1988; Kim et al. 1989; Leung et al. 1989).  It is a feature of every mammalian alpha-like globin gene cluster examined (Fig. 5), and given that gene deletions can occur in this locus (see below), one would anticipate the loss of nonfunctional genes in at least some mammalian lineages.  The retention of the theta-globin genes is suggestive of some functional importance, but perhaps not for encoding a globin polypeptide. 

            Although no examples of recruitment for expression at different developmental stages are seen in the alpha-like globin gene clusters, some genes have lost their function during evolution.  In particular, based on upstream sequence matches, the human ya1-globin pseudogene appears to be orthologous to an active alpha-globin gene in goats and horse (Fig. 5).  The inactivation of ya1-gene is accompanied by the loss of a CpG island that encompasses its homologs (Bird et al. 1987).  The orthologous relationships in Fig. 5 indicate that the a2- and a1-globin genes in humans result from a duplication only in primates, i.e. separate from the duplication proposed to generate the pair of alpha-globin genes in the ancestral eutherian mammal.  The more recent duplication in primates has left a long region of sequence similarity surrounding the alpha-globin genes (Hess et al. 1984), and unequal cross-overs within that region of homology is the cause of some forms of alpha-thalassemia.  Not all mammals have retained a pair of active alpha-globin genes.  Rabbits are the exception, with only one alpha-globin gene (Cheng et al. 1986).  Curiously, this gene cluster has expanded by block duplications of a zeta-zeta-theta gene triad (Cheng et al. 1987), similar to the expansion of the beta-like globin gene cluster in goats (Townes et al. 1984) (Fig. 4). 

            All the vertebrate globin gene clusters examined to date encode subunits of hemoglobins differentially expressed in embryonic and adult erythroid cells (Fig. 3).  Likewise, hemoglobin synthesis is developmentally regulated in some invertebrates (Terwilliger 1998) and different plant leghemoglobins are made at progressive stages of nodulation (Hyldigamma-Nielsen et al. 1982; Lee et al. 1983).  Even species as distant from human as Chlamydomonas  (Couture et al. 1994) and Paramecium (Yamauchi et al. 1995) have multiple hemoglobin genes.  Thus the ability to express different hemoglobins at particular developmental stages, i.e. hemoglobin switching, is very old, predating the plant-animal divergence, and possibly being much older.  In Fig. 3, multiple globin genes are shown in the ancestral gene clusters.  It is likely that they were differentially expressed during development in these ancestral species.

 

5.4  Differences in genomic context and regulation of the mammalian alpha-globin and beta-globin gene clusters

 

            The separation of alpha-globin and beta-globin gene clusters to different chromosomes has allowed them to diverge into strikingly different genomic contexts, with paradoxical consequences for our understanding of their regulation.  Given that all contemporary vertebrates have developmentally regulated hemoglobin genes encoding proteins used for oxygen transport in erythrocytes, it would have been reasonable to expect that the molecular mechanisms of globin gene regulation would be conserved in vertebrates.  Certainly, the coordinated and balanced expression of alpha- and beta-globin genes to produce the heterotypic tetramer alpha2beta2 in erythrocytes should be a particularly easy aspect of regulation to explain.  Since the two genes would have been identical after the initial duplication in the ancestral vertebrate, with identical regulatory elements, it is parsimonious to expect  selection to keep the regulatory elements very similar. 

            However, much has changed between the alpha-like and beta-like globin gene clusters since their duplication.  Not only are they now on separate chromosomes in birds and mammals, but  in mammals they are in radically different genomic contexts (Fig. 6).  The beta-globin gene clusters are A+T rich, with no CpG islands (reviewed in Collins and Weissman 1984), whereas the alpha-like globin gene clusters are highly G+C rich, with multiple CpG islands (Fischel-Ghodsian et al. 1987).  Tissue-specific gene expression is frequently correlated with an increased accessibility of the chromatin only in expressing cells, and hence "opening" of a chromatin domain is a key step in activation of many tissue-specific genes.  This is the case for beta-like globin genes of mammals (Groudine et al. 1983; Forrester et al. 1990), but not the alpha-like globin genes, which are in constitutively open chromatin (Craddock et al. 1995).  Thus the mammalian alpha-globin genes have several characteristics associated with constitutively expressed, "housekeeping" genes.  Additionally, the alpha-globin genes are replicated early in S phase in all cells (a time when most expressed genes are replicated), whereas beta-globin genes are replicated early in S phase only in cells expressing them (Calza et al. 1984; Dhar et al. 1988).  In keeping with the presence of CpG islands, the alpha-globin gene cluster is not methylated in any cell types (Bird et al. 1987), whereas the beta-globin gene cluster is subject to tissue-specific DNA methylation (van der Ploeg and Flavell 1980).  Thus the strikingly different genomic contexts of the two gene clusters affect several aspects of DNA and chromatin metabolism, including timing of replication, extent of methylation, and the type of chromatin into which the loci are packaged.  Rather than selecting for similarities to insure coordinate and balanced expression, the processes of evolution at these two loci have made them quite different.

 

Fig. 6.  Differences in chromatin structure between alpha- and beta-globin gene clusters of humans. Globin genes and distal control regions are shown as filled boxes.  HS-40 is located within an intron (white box) of the -14 gene (exons of this gene are shown as black boxes), located upstream of the zeta-globin gene and transcribed in the opposite orientation.  Developmentally stable DNase hypersensitive sites (HSs) are shown as filled arrows, and those that occur at specific developmental stages (when the associated gene is expressed) are shown as white triangles.  CpG islands are shown as boxes with horizontal lines.  None are in the beta-globin gene.  HBB= beta-globin gene, HBA = alpha-globin gene.  References are in the text. a=alpha, b=beta, z=zeta, e=epsilon, g=gamma, d=delta, r=rho, q=theta

 

            The alpha and beta-globin gene clusters also have important differences in their cis-regulatory elements.  Both have distal control elements, called the locus control region (or LCR) for beta-globin genes (reviewed in Grosveld et al. 1993; Hardison et al. 1997b) and HS-40 for the alpha-globin genes (Higgs et al. 1990), that are required for higeta-level expression of the respective globin genes in transgenic mice, independently of the position of chromosomal integration.  However, they differ in the range of functions associated with them. The beta-globin LCR is required for tissue-specific chromosomal domain opening (Forrester et al. 1987; Forrester et al. 1990), whereas no such function has been implicated for HS-40, as expected for a regulator in constitutively open chromatin.  The distal regulatory elements also differ dramatically in size, with the beta-globin LCR containing 17 kb with 5 DNase hypersensitive sites in chromatin (Tuan et al. 1985; Forrester et al. 1987; Grosveld et al. 1987; Dhar et al. 1990), compared to about 0.4 kb and a single DNase hypersensitive sites in chromatin for the alpha-globin HS-40 (Jarman et al. 1991). 

            Indeed, the alpha-globin HS-40 is most similar to a single hypersensitive site, HS2, from the beta-globin LCR.  Both will confer inducible, high level expression on reporter genes in transfected cells (e.g. Tuan et al. 1985; Ney et al. 1990; Pondel et al. 1992; Ren et al. 1993), in addition to their effects in transgenic mice (Fraser et al. 1990; Higgs et al. 1990; Morley et al. 1992).  As illustrated in Fig. 7, these two enhancers share binding sites for some, but not all, transcription factors (e.g. Talbot et al. 1990; Jarman et al. 1991; Strauss et al. 1992; Reddy et al. 1994).  Both contain maf-response elements, or MAREs (Motohashi et al. 1997), to which transcriptional activator proteins of the basic leucine zipper class can bind.  A particular subfamily of proteins related to AP1, such as NFE2, LCRF1/Nrf1, and Bach1, bind to this site (reviewed in Orkin 1995; Baron 1997).  All are heterodimers containing a Maf protein as one subunit, which is the basis for the name.  Other binding sites in common are GATA, to which GATA1 and its relatives bind (Evans et al. 1990), and the CACC motif, to which a family of Zn-finger proteins including EKLF can bind (Miller and Bieker 1993) .  These binding sites are occupied in vivo (Strauss et al. 1992; Reddy et al. 1994), and all have been shown to contribute to the function of the enhancers (Strauss et al. 1992; Caterina et al. 1994; Reddy et al. 1994; Rombel et al. 1995).  Other functional sites, such as the E boxes in HS2 (Lam and Bresnick 1996; Elnitski et al. 1997), are not found in common.  Each of these binding sites is conserved in homologous regulatory elements in mammals (reviewed in Gourdon et al. 1995; Hardison et al. 1997b).  In general, binding sites for many of the proteins implicated in activation of globin gene expression are present in both HS-40 and HS2 of the beta-globin LCR, but their number and arrangement differs in the two enhancers (Fig. 7).  It is currently not possible to assess whether this limited similarity occurs via divergence from a common ancestral regulatory element or by convergence from different ancestral elements.

 

Fig. 7.  Conserved motifs in distal regulatory elements.  Similar protein binding sites have the same fill, and proteins implicated in acting at a given site are listed below that motif in the beta-LCR HS2 line; the same proteins have been implicated at similar motifs in the other regulatory elements.  Boxes without labels are conserved sequences of untested function.

 

            The proximal regulatory elements also differ in important ways between alpha-globin and beta-globin genes (Fig. 8).  The promoters do contain two binding sites in common - the TATA motif, to which the general transcription factor TFIID binds, and the CCAAT motif, to which several families of trans-activators, such as  CP1, can bind (Efstratiadis et al. 1980).  However, the other protein binding sites are completely different between the human alpha and beta-globin genes (deBoer et al. 1988; Rombel et al. 1995).  In addition, the CpG island encompassing the 5' flanking region and much of the gene is a key component of the cis-regulatory elements for the alpha-globin gene of rabbits and humans, possibly through its effects on chromatin structure (Pondel et al. 1995; Shewchuk and Hardison 1997);  again, no CpG island is found at any of the beta-like globin genes.

 

Fig. 8.  Conserved features of globin gene promoters.  Binding sites for the human globin genes are shown;  similar protein binding sites have the same fill.  Those sites conserved in other mammals have a dark outline, those not conserved have a grey outline.  The figure is not drawn to scale, but relative positions are indicated;  the genes themselves are truncated.  The chicken beta-globin gene promoter is shown for comparison;  no information about evolutionary conservation is presented for this gene.  This figure summarizes work from a large number of papers (e.g. Efstratiadis et al. 1980; Lacy and Maniatis 1980; Hardison 1983; Antoniou et al. 1988; Barnhart et al. 1988; deBoer et al. 1988; Wall et al. 1988; Martin et al. 1989; Puruker et al. 1990; Stuve and Myers 1990; Macleod and Plumb 1991; McDonagh et al. 1991; Yu et al. 1991; Gong and Dean 1993; Motamed et al. 1993; Peters et al. 1993; Yost et al. 1993; Lloyd et al. 1994; Rombel et al. 1995). a=alpha, b=beta, z=zeta, e=epsilon, g=gamma, d=delta, r=rho, q=theta.

 

            The differences in genomic context between alpha-globin and beta-globin genes have been seen for non-human mammals as well (Bernardi et al. 1985), indicating an origin prior to the divergence of eutherian mammals.  The high G+C content and presence of CpG islands is characteristic of alpha-globin gene clusters from goats (Wernke and Lingrel 1986), horse (Flint et al. 1988) and rabbit (Hardison et al. 1991), whereas the beta-globin gene clusters from non-human mammals are rich in A+T (e.g. Margot et al. 1989; Shehee et al. 1989).  The only apparent exception is the mouse alpha-globin gene cluster, which to date has not been completely characterized.  However, the sequenced mouse alpha-globin gene is not in a CpG island (Nishioka and Leder 1979).  Indeed, the mouse genome shows a general depletion in CpG islands (Antequera 1993; Matsuo et al. 1993).  The loss of the CpG island is correlated with a chromosomal rearrangement that moved the alpha-globin locus from the terminus of a chromosome, where it is in human (Flint et al. 1997) and rabbit (Xu and Hardison 1991), to an internal location in mouse (Tan and Whitney 1993).  It is possible that position of the alpha-globin locus close to the end of the chromosome, in a large region high in G+C content (Flint et al. 1997), is critical to maintenance of the CpG islands.

            Despite these many differences between alpha-globin and beta-globin gene clusters in mammals, the appropriate genes are still expressed coordinately between the two loci, resulting in balanced production of alphalike and beta-like globins needed for the synthesis of normal hemoglobins.  The full mechanism that accomplishes this task still eludes our understanding.

 

5.5. Origin and location of the LCR in birds and mammals

 

            In contrast to the differences between the alpha and beta-globin gene clusters of mammals, tissue-specific opening of a chromatin domain occurs in beta-globin gene clusters of both mammals and birds.  In fact, the association between accessible chromatin and gene activation was first made for the chicken globin genes (Weintraub and Groudine 1976).  Further studies of the avian gene clusters carefully mapped the limits of the open domain in chromatin to a region of 33 kb, extending about 10 kb on each side of the set of 4 globin genes (Clark et al. 1993; Hebbes et al. 1994).  The limits of the open domain for the human beta-globin gene cluster have not been determined.  The open domain at the 5' end encompasses at least the LCR, which extends 20 kb 5' to the epsilon-globin gene, and an erythroid HS has been mapped as far as 70 kb 3' to the beta-globin gene (Elder et al. 1990), indicating an open domain of at least 150 kb.

            LCRs have been mapped in beta-globin gene clusters both in chickens and in all mammals examined to date.  The LCRs in mammals are homologous (Moon and Ley 1990; Li et al. 1991; Hug et al. 1992; Jimenez et al. 1992; Hardison et al. 1993; Hardison et al. 1997b; Slightom et al. 1997), with long segments of high sequence similarity found both inside and outside the cores of the LCR HSs (Fig. 9).   All are located 5' to the epsilon-globin gene (Fig. 4).  Sequences similar to HS2 and HS3 of the beta-globin LCR are also found in marsupials and monotremes (R. Hope, personal communication), indicating that the LCR is predates the divergence of placental and nonplacental mammals, about 173 Myr ago (Kumar and Hedges 1998).  In contrast, the major LCR activity in chickens maps to an enhancer located between the betaA and epsilon-globin genes (Reitman et al. 1990).  Four additional HSs are 5' to the rho-globin gene, and together they produce a modest enhancement in expression (Reitman et al. 1995).  Despite their analogous location at the 5' ends of the gene cluster, they do not have the pronounced effects associated with the HSs in the mammalian LCR.  In fact, comparison of the DNA sequences of the beta-globin gene clusters between chicken and human fails to reveal any statistically significant alignments outside the coding regions of some of the exons (Reitman et al. 1993; Hardison 1998).  Thus no clear homologies are present in either the distal regulatory elements (LCRs and enhancers) or in the promoters.