This
chapter was published in Disorders of Hemoglobin: Genetics, Pathophysiology, Clinical Management in 2001, Cambridge University Press,
Cambridge, UK
Editors: Martin H. Steinberg, Bernard G. Forget,
Douglas R. Higgs, and Ronald L. Nagel
Copyright: Cambridge University Press and Ross
Hardison
Chapter 5:
Organization, evolution and regulation of the globin genes
Original
version completed: June 17, 1998
Updated
May 09, 2000
Published
in 2001.
The
revised version of this book (completed in 2007) leaves out most of the
material in this chapter, and thus I am posting it on the internet to keep the
material available. Please reference the chapter and book if you use this
information.
Author: Ross C. Hardison
Department
of Biochemistry and Molecular Biology, Center for Comparative Genomics and
Bioinformatics, The Pennsylvania State University, 304 Wartik Laboratory,
University Park, PA, 16802 USA
Phone:
814-863-0113
E-mail:
rch8@psu.edu
5.1
Introduction
Hemoglobin
genes are ancient, dating back perhaps as far as the origins of cellular
life. The familiar class of
hemoglobins used for oxygen transport illustrates only one function of
hemoglobins. This chapter will
review some of the principle events in the evolution of vertebrate globin gene
clusters within the context of their long history. This evolutionary framework provides some insights into
important issues such as the origin and function of the locus control regions,
the contrasting chromatin structure of alpha- and beta-like globin gene clusters,
and the prospects for targeting delta- or gamma-globin gene expression in
therapies for beta-globin gene defects.
5.2.
Broad distribution of hemoglobins in the biosphere
Hemoglobins
similar to human Hb A are found in erythrocytes of all vertebrates (Dickerson
and Geis 1983). Each is a
heterotetramer with two subunits related to the alpha-globin subfamily
(referred to here as alpha-like globins) and two subunits related to the
beta-globin subfamily (beta-like globins). The globin polypeptides bind heme, which in turn allows the
hemoglobin in erythrocytes to bind oxygen reversibly and transport it from the
lungs to respiring tissues. In all
species studied, different alpha-like and beta-like globins are synthesized at
progressive stages of development to produce hemoglobins characteristic of
primitive (embryonic) and definitive (fetal and adult) erythroid cells (Bunn
and Forget 1986). However, the
vertebrate hemoglobins comprise only a small part of the hemoglobin family
(Fig. 1). A close relative, the
monomeric myoglobin, stores oxygen in tissues such as muscle (Wittenberg and
Wittenberg 1987). As illustrated
by the summary phylogenetic tree in Fig. 1, the amino acid sequences of the
alpha- and beta-globins and myoglobin are related to each other, indicating a
common ancestor in early vertebrates approximately 500 million years ago
(Goodman et al. 1987). The
3-dimensional structure of myoglobin was one of the first solved, revealing a
series of alpha-helices that form the hemepsilon-binding pocket. This structure, the globin fold, is
seen in myoglobin, alpha-globin, and beta-globin; it is characteristic of all members of the hemoglobin family
of proteins (Dickerson and Geis 1983).

Fig. 1. Broad distribution and diverse
functions of hemoglobins. The
phylogenetic tree on the left is a summary of trees generated by aligning amino
acid sequences of hemoglobins from species representative of each taxa (using
CLUSTAL W) and computing trees based on parsimony (PAUP) and analysis of
distance measures by Neighbor joining and UPGMA. The latter two used the MEGA suite of programs (Kumar et al.
1993). Trees of the same topology
were generated by all three methods.
The summary tree shows that topology but is not drawn to scale. This and
subsequent trees indicate the relative time of the divergences; nodes more to the left indicate a
relatively earlier time. Functions
and induction agents are also listed.
More details and references have been reviewed (Hardison 1998).
Hemoglobins
are also used for oxygen transport in invertebrates (Riggs 1991; Dixon et al.
1992; Sherman et al. 1992). Many
non-vertebrates have gigantic extracellular hemoglobins, in some species formed
by as many as 200 monodomain subunits in a multimeric protein, and in others by
covalent linkage into very long polypeptide chains (reviewed in Terwilliger
1998). The invertebrate
hemoglobins are homologous to the vertebrate hemoglobins, and they form a
distinct branch in a phylogenetic tree of hemoglobins (Fig. 1). Hemoglobins are present in plants, both
the leghemoglobins with specialized functions in root nodules (Brisson and
Verma 1982; Appleby 1984) as well as the broadly distributed, nonsymbiotic
hemoglobins (Andersson et al. 1996).
Interestingly, the genes for plant and invertebrate hemoglobins have a
similar structure. Both groups of
genes have three introns separating four exons, with at least two introns in
identical positions (Fig. 2). The
similarities in gene structure and the amino acid sequences of the encoded
proteins strongly support the hypothesis of a common ancestor to both groups of
hemoglobins, showing that the evolutionary history of hemoglobin genes predates
the divergence of plants and animals, roughly 1.3 billion years ago (Feng et
al. 1997). It is likely the middle
intron was lost prior to the divergence of the vertebrate globin genes, all of
which have only two introns.

Fig. 2. Intron/exon structure during evolution
of hemoglobin genes.
The structures
of illustrative contemporary hemoglobin genes are shown on the right, with
exons denoted by dark boxes and introns by white boxes. The position in the hemoglobin
a-helical structure of the amino acid encoded at the site of interruption is
indicated over the intron, and the loss of the central intron in the ancestor
to vertebrates is marked by a vertical arrow. The evolutionary pathway is indicated by the other
arrows. This "tree" is a
gene tree, and grouping of of a yeast hemoglobin gene with bacterial hemoglobin
genes may reflect a horizontal gene transfer. Estimated times of divergence in millions of years (Myr) are
given at selected nodes.
Given
that the hemoglobins in the major groups of multicellular organisms - plants,
invertebrates and vertebrate animals - are used for storage and transport of oxygen,
one might have expected hemoglobins to be absent from unicellular
organisms. It was thought that
simple diffusion was sufficient to provide adequate oxygen inside the cells of
unicellular, freepsilon-living organisms. However, hemoglobins have now been characterized
in several species of eubacteria,
the fungus Saccharomyces cerevisiae, and protists
such as the alga Chlamydomonas and the protozoan Paramecium
(reviewed in Hardison 1996; Hardison 1998) These hemoglobins from unicellular organisms appear to play
roles distinctly different from those of vertebrate hemoglobins. The familiar functions in oxygen
transport and storage require reversible binding of oxygen, and that occurs only when the iron in
the heme stays in the reduced (+2) oxidation state, i.e. when it is a ferrous
ion. Biochemical analysis has
shown that hemoglobins from Chlamydomonas (Couture and
Guertin 1996), Saccharomyces (Zhu and Riggs 1992) and the bacterium Alcaligenes
(Cramm et al. 1994) can participate in electron transfer reactions in
vitro, with the
hemepsilon-bound iron changing cyclically between the +2 and +3 oxidation
states. The latter two hemoglobins
are actually two-domain proteins, one binding heme and the other binding flavin
cofactors, which usually plays a role in redox reactions. Also, the hemoglobin
from Vitreoscilla can
serve as a terminal electron acceptor during respiration in vivo (Dikshit et al. 1992).
Recent studies clearly show that the
hemoglobins in unicellular organisms have enzymatic functions and are not
oxygen-transporting proteins. The flavohemoglobins from the enteric bacteria Escherichia
coli and Salmonella
typhimurium (Crawford
and Goldberg, 1998; Gardner et al., 1998; Hausladen et al., 1998) and from yeast (Liu
et al., 2000) are
enzymes protecting these microorganisms from the highly reactive free radical
compound, nitric oxide. Each of these flavohemoglobins is a nitric oxide
dioxygenase, catalyzing the conversion of nitric oxide to nitrate. Other functions
have also been proposed for bacterial hemoglobins. For instance, the hemoglobin
from Vitreoscilla can
serve as a terminal electron acceptor during respiration in vivo (Dikshit et al., 1992).
Hemoglobins
involved in catalytic conversions of nitric oxide and oxygen are not limited to
microorganisms. A hemoglobin found in the perienteric fluid of the parasitic
worm Ascaris lumbricoides
also catalyzes reactions between oxygen and nitric oxide, producing nitrate (Minning
et al., 1999). However, the chemical mechanism is different from that of
the microbial flavohemoglobins, and Mining et al. (1999) propose that this
hemoglobin functions to remove oxygen from the perienteric fluid via a series
of reactions driven by nitric oxide.
In mammals, hemoglobins not only
transport oxygen, but they also help regulate nitric oxide levels. Gow et al. (1998) show that at physiological concentrations, nitric oxide
will bind to a cysteine in hemoglobin to form S-nitrosohemoglobin. This binding
is favored in oxyhemoglobin, and retains the bioactivity of nitric oxide.
Nitric oxide can subsequently be released from deoxyhemoglobin (Jia
et al., 1996; Stamler et al., 1997). Since nitric oxide is a major regulator of blood pressure,
these new findings indicate that hemoglobin is involved in the control of blood
pressure in ways that may facilitate efficient delivery of oxygen to tissues.
Furthermore, the interplay between binding of oxygen and nitric oxide to
hemoglobin and effects on vasodilation and constriction may have therapeutic
applications (e.g., Bonaventura et al., 1999; Gladwin
et al., 1999; Nagel, 1999).
The
variety of functions now found for hemoglobins raises the issue of whether the
microbial proteins are truly homologous to the hemoglobins from plants and
animals. The amino acid sequence
comparisons certainly support a common ancestor to all these sequences, as
illustrated in the summary phylogenetic tree (Fig. 1). Despite the low percent identity (e.g.
25%) between the more dissimilar members of the family, different types of
phylogenetic analysis generate trees of the same topology. The threepsilon-dimensional structures
strongly support the conclusion that all these hemoglobins share a common
ancestor. The structures of the
bacterial hemoglobins from Vitreoscilla (Tarricone et
al. 1997) and Alcaligenes
(Ermler et al. 1995) both have the globin fold first characterized in
vertebrate myoglobin. Indeed,
hemoglobins may be part of a larger family of hemoproteins. For instance, the light harvesting biliprotein, C-phycocyanin, from the
cyanobacterium Mastigocladus laminosus has a threepsilon-dimensional structure very similar to
that of a globin (Schirmer et al. 1985).
Although this is not a heme binding protein per se, it does bind a
linear tetrapyrrole pigment derived from heme. The structural comparisons
indicate that genes for at least some other hemoproteins share a common
ancestor with hemoglobin genes (Fig. 1).
These
observations all indicate that the gene encoding hemoglobin is truly ancient,
i.e. it appears to have been present in the ancestor to eubacteria and
eukaryotes, which is the earliest proposed divergence since the origin of
cellular organisms. This
divergence has been dated at approximately 3.9 billion years ago (Feng et al.
1997). At this early time, very
little oxygen was present in the earthÕs atmosphere. Hence the primordial
function of hemoglobin may have had little to do with molecular oxygen (Hardison,
1998). The enzymatic functions of hemoglobins found in
contemporary microorganisms and nematodes, involving nitric oxide metabolism,
provide some insight into the early functions of hemoglobins (Durner
et al., 1999). Minning et al. (1999) describe a scenario in which hemoglobins present in
contemporary bacteria, which catalyze the enzymatic detoxification of nitric
oxide, represent an ancestral function. These ancestral hemoglobins may have
evolved into enzymes that catalyze the nitric oxidepsilon-mediated consumption
of oxygen, as now observed as a ÒdeoxygenaseÓ in Ascaris. This ÒdeoxygenaseÓ may have evolved
into contemporary mammalian hemoglobins, with their limited enzymatic function
but the ability to bind and transport both oxygen and nitric oxide.
Because
the separation between archaebacteria and eukaryotes appears to have occurred
after the divergence of eubacteria from eukaryotes (both of which have
hemoglobins), one may anticipate finding homologs to hemoglobins in archae as
well. An automated analysis of
genome sequences has included an archaebacterial gene (from Methanococcus
jannaschii) in an
orthologous group (Tatusov et al. 1997) containing proteins related to
hemoglobins (see http://www.ncbi.nlm.nih.gov/COG/). Further investigation of this and other archaebacterial
genes related to hemoglobins, revealed from the whole genome sequencing, should
provide even more insights into the origin and range of functions of hemoglobins.
An
issue that has received much attention is the age of the introns, and whether
they serve to separate genes into exons that encode distinct protein domains
(Gilbert 1978). The three introns
of globin genes in plants and invertebrates, dating back approximately 1.3
billion years, are between the segments of the gene encoding measurable domains
of protein structure (Go 1981).
This has lent support to the model that introns are old, and are the
remnants of a process that combined exons to generate genes with new structures
and functions (Gilbert 1978).
However, the hemoglobin genes from protists have introns in positions
unique to many of the species, and those of eubacteria have no introns (Fig.
2). Attempts to explain this
degree of heterogeneity as the result of differential loss of introns require a
very large number of introns to be proposed in the ancestral gene. An alternative explanation is that at
least some of the introns in the protist hemoglobin genes arose by insertion of
new introns in each lineage, consistent with the "introns late" model
(Stoltzfus et al. 1994). Thus it
seems unlikely that all the introns in contemporary hemoglobin genes were
present in the ancestral gene (i.e. preceding the divergence of eubacteria, archaebacteria
and eukaryotes), and hence the "introns early" hypothesis is not
adequate to explain all of the introns.
However, this does not rule out the possibility that some introns,
perhaps those still in hemoglobin genes in multicellular organisms, were in the
ancestral gene.
5.3.
Evolution of alpha- and beta-globin gene clusters in vertebrates
Human
hemoglobins are encoded at two separate loci, the beta-like globin gene cluster
on chromosome 11p15.5 (Deisseroth et al. 1978) and the alpha-like globin gene
cluster close to the terminus of chromosome 16p (Deisseroth et al. 1977). As shown in Fig. 3, the genes in each
cluster are in the same transcriptional orientation and are arranged in the
order of their expression during development, with the active beta-like globin
genes arranged 5'-epsilon (embryonic)-Ggamma (fetal)-Agamma (fetal)-delta
(minor adult)-beta (major adult)-3' (Fritsch et al. 1980) and the active
alpha-like globin genes arranged 5'-zeta (embryonic)-alpha2 (fetal and adult)-alpha1
(fetal and adult)-3' (Lauer et al. 1980).
[Lower levels of the gamma- and alpha-globins are also produced in
embryonic red cells.] This section
reviews some of the key events in the evolution of this arrangement of globin
genes.

Fig. 3. Evolutionary pathways for alpha- and
beta-globin gene cluters in vertebrates.
Each gene is indicated simply by a Greek letter. Contemporary gene clusters are on the
right (Hardison 1991 and references therein), and the deduced course of
evolution to them is shown by a series of arrows. The ancestor to alpha- and beta-globin genes is indicated as
pro-alpha/beta. LCR = locus
control region, HS-40 = the distal major control region of mammalian
alpha-globin genes, HSs = DNase hypersensitive sites, En = enhancer.
The
effective transport of oxygen between tissues by hemoglobin is accomplished by
highly cooperative binding of oxygen when its concentration is high (e.g. in
the lungs), followed by cooperative dissociation when its concentration is low
(e.g. in respiring tissues in the periphery). In vertebrates, this cooperativity is accomplished by the
interactions between the alpha- and beta-globin subunits of hemoglobin (see
chapter 8 by M. Perutz).
Vertebrate hemoglobins are kept at high concentrations inside
erythrocytes, specialized cells devoted to the task of oxygen transport. Thus the divergence of the ancestral
globin gene into the alpha-globin and beta-globin genes (Fig. 3), and
expression of these genes at a high level only in erythroid cells, were key
steps in the evolution of cooperativity in hemoglobin and efficient oxygen
transport in vertebrates. These
goals have been accomplished by different mechanisms in other evolutionary
lineages. For instance, the basis
for cooperativity in non-vertebrate hemoglobins is quite different, in many
cases involving reversible dissociation of hemoglobin subunits upon oxygenation
(Riggs 1998).
Vertebrate
alpha- and beta-globin genes likely arose by the duplication and subsequent
divergence of an ancestral globin gene in early vertebrates. This would have generated a linked set
of alpha- and beta-globin genes (Fig. 3), which is the arrangement seen in
contemporary globin gene clusters of the teleost zebrafish Danio rerio
(Chan et al. 1997) and in the amphibian Xenopus (Hosbach et al. 1983). The alpha-globin gene cluster is thought
to have separated from the beta-globin gene cluster prior to the divergence of
birds and mammals, since these gene clusters are on separate chromosomes
in both groups of animals
(Deisseroth et al. 1976; Hughes et al. 1979). Gene duplication and divergence continued independently in
each of these lineages to generate the contemporary gene clusters. This is illustrated by the avian and
mammalian beta-globin gene clusters, which contain multiple genes expressed
differentially in development (Fig. 3).
In both species the epsilon- globin gene is expressed in embryos and the
beta-globin gene is expressed in adults.
However, the sequence of each chicken beta-like globin gene is equally
similar to each human gene (Goodman et al. 1987; Reitman et al. 1993), so that,
for example, chicken epsilon-globin is no more similar to human epsilon-globin
than to human beta-globin. This
indicates that the gene duplications generating these beta-globin gene clusters
occurred after the species diverged.
Much
is now known about the organization of alpha- and beta-globin gene clusters in
contemporary mammals, which can be understood in terms of descent from common
gene clusters in an ancestral eutherian mammal. Fig. 4 shows beta-globin gene clusters in species from five
orders of eutherian mammals and a marsupial. Analysis of DNA sequences showed that a given globin gene in
one species is usually more related to a gene in another mammal than to other
globin genes in the same species (Hardison 1983; Goodman et al. 1984; Hardies
et al. 1984; Hardison 1984; Townes et al. 1984). This indicated that these genes are orthologous, i.e., they are similar because of
descent from the same gene in the last common ancestor to the two species. The exceptions to this observation could
be explained by gene duplications within a single mammalian lineage, e.g. the
duplication of gamma-globin genes in the ancestor to simian primates to produce
paralogous genes
(similar because of duplication of the ancestral gene), such as the Ggamma- and
Agamma-globin genes in humans (Shen et al. 1981; Fitch et al. 1991). Finding orthologs to epsilon-, gamma-,
eta-, delta- and beta-globin genes, in that order, in virtually all eutherian
mammals suggested that the ancestral eutherian had at least this set of genes
(Fig. 4). This hypothesis was
strongly supported by the observation of substantial regions of sequence
similarity outside the coding regions of the genes, in the introns and flanking
regions (Hardies et al. 1984; Hardison 1984; Margot et al. 1989; Shehee et al.
1989; Hardison and Miller 1993).
As will be discussed more extensively below, some but not all of these
matching sequences are strong candidates for regulatory function. The long regions of matching sequences
outside functional regions, and thus not subject to any obvious selection, were
key observations in establishing this model for evolution of the mammalian
globin gene clusters. Deletions,
conversions and duplications of both single genes and blocks of genes have
occurred in each mammalian order to generate the current gene clusters
(reviewed in Collins and Weissman 1984; Hardison 1991).

Fig. 4. Pathways to contemporary mammalian
beta-globin gene clusters. Genes
are indicated by boxes, and orthologous genes have the same type of fill. The presumptive presence of an LCR in
marsupials is indicated by the grey outline (R. Hope, personal
communication). The stage of
expression is indicated as E = embryonic, F = fetal, and A = adult. References are in the text and in
reviews (Collins and Weissman 1984; Hardison 1991). a=alpha, b=beta, z=zeta,
e=epsilon, g=gamma, d=delta, r=rho, q=theta
The
proposed epsilon-gamma-eta-delta-beta globin gene cluster in the ancestral
eutherian mammal was generated by earlier gene duplications. Estimates based on rates of divergence
indicated that the epsilon-, gamma- and eta-globin genes arose from
duplications of one ancestral gene, whereas the delta- and beta-globin genes
arose by duplication of a different gene, perhaps prior to the divergence of eutherian
and metatherian (marsupials and monotreme) mammals (Goodman et al. 1984;
Hardison 1984). This prediction
was verified by genomic analysis of marsupials (Koop and Goodman 1988; Cooper
et al. 1996), which have two genes in their beta-globin gene clusters, one most
related to eutherian epsilon-globin genes and the other most related to
beta-globin genes. Thus the model
shown in Fig. 4 is robust, in that it has been supported not only by deductions
from analysis of contemporary species, but also by tests of predictions made by
the model.
One
important ramification of this model is that orthologous genes have not
retained the same time of expression during development in all mammalian
orders. The gamma-globin gene in
most mammals is expressed in embryonic erythroid cells, but in simian primates,
including humans, it is expressed predominantly in fetal erythroid cells. Concomitantly with the fetal
recruitment of the gamma-globin gene, expression of the beta-globin gene has
been delayed in higher primates so that in humans it is expressed primarily in
post-natal life. In other mammals,
the beta-globin gene is expressed in both fetal and adult erythroid cells. In
goats, the gamma-globin gene has been deleted, and subsequent expansion of the
gene cluster by triplication of a four-gene set (Townes et al. 1984) allowed expression of the resulting
paralogous beta-globin genes in fetal life (betaF), adult life (betaA)
or under conditions of erythropoietic stress (betaC). The delta-globin
gene is expressed at low levels in adult humans, but is silent in some mammals,
and is expressed at high levels in others. In contrast, the epsilon-globin gene in each mammalian
species is expressed only in embryonic erythroid cells derived from the yolk
sac. Within the beta-globin gene
clusters of mammals, conservation of stage-specific expression is seen only for
this gene, which is located closest to the distal locus control region (LCR,
see below and chapter 6 by B. Forget).
Perhaps the embryonic restriction of epsilon-globin gene expression is
related to this spatial relationship, with active expression in the embryonic
lineage due to its proximity to the LCR, followed by silencing in the fetal and
adult (definitive) lineage of erythroid cells (see chapter 14 by G. Stamatoyannopoulos). Both the proximity to the LCR and the
embryonic restriction to expression is conserved in all mammalian
epsilon-globin genes examined.
The
beta-globin gene clusters of humans and mice are embedded within a large
cluster of olfactory receptor genes, or ORGs (Bulger
et al., 1999). This arrangement suggests that the beta-globin genes
transposed into a prepsilon-existing array of ORGs. A related ORG is found on
the 3Õ side of the chicken beta-globin gene cluster (Bulger
et al., 1999), but an erythroid-specific folate receptor gene is located
on the 5Õ side, separated from the beta-globin gene cluster by an insulator (Prioleau
et al., 1999). The 3Õ breakpoints of at least two deletions causing
hereditary persistence of fetal hemoglobin (HPFH) in humans are close to ORGs
located 3Õ to the beta-globin genes (Kosteas
et al., 1997; Feingold et al., 1999). The ORG close to the HPFH-1 breakpoint is in an open
chromatin domain in human erythroid cells (Elder
et al., 1990). Enhancer sequences from this ORG are brought in proximity
to the gamma-globin genes by the HPFH-1 deletion, and this may play a role in
the increased expression of gamma-globin genes in adults carrying this deletion
(Feingold and Forget, 1989).

Fig. 5. Pathways to contemporary mammalian
alpha-globin gene clusters. Genes
are indicated by boxes, and orthologous genes have the same type of fill. The presumptive presence of a homolog
to HS-40 (the distal major control region) in mammals besides human and mouse
is indicated by the grey outline.
References are in the text and in reviews (Collins and Weissman 1984;
Hardison 1991; Hardison and Miller 1993). a=alpha, b=beta, z=zeta, e=epsilon,
g=gamma, d=delta, r=rho, q=theta
The
evolution of alpha-globin gene clusters in contemporary mammals is not as well
understood, in part because less information is available on gene organization
and sequence in non-human mammals, and in part because the rate of sequence
change in this gene cluster appears to be higher than in the beta-globin gene
cluster (Hardison et al. 1991).
Fig. 5 summarizes the arrangement of alpha-like globin gene clusters in
representatives of 5 orders of eutherian mammals. Orthologous relationships have been assigned primarily on
the basis of DNA sequence matches outside the genes (Hardison and Gelinas 1986;
Sawada and Schmid 1986; Wernke and Lingrel 1986; Flint et al. 1988), even
though such matches are considerably more limited than in the mammalian
beta-globin gene clusters. Since a
variant of the arrangement 5'-zeta-zeta-alpha-alpha-theta-3' is found in all
contemporary mammals examined, it is likely that these genes were present in
this order in the gene cluster of the ancestral eutherian mammal. The timing of expression is
well-conserved among these mammals.
The zeta-globin genes are expressed only in embryonic erythroid cells,
whereas the alpha-globin genes are expressed in all erythroid cells, albeit at
lower levels at the embryonic stage (Rohrbaugh and Hardison 1983; Leder et al.
1985; Peschle et al. 1985). The
theta-globin genes are still not well understood. The human theta-globin gene is transcribed at low levels but
does not encode any known polypeptide found in human hemoglobins (Hsu et al.
1988; Kim et al. 1989; Leung et al. 1989). It is a feature of every mammalian alpha-like globin gene
cluster examined (Fig. 5), and given that gene deletions can occur in this
locus (see below), one would anticipate the loss of nonfunctional genes in at
least some mammalian lineages. The
retention of the theta-globin genes is suggestive of some functional
importance, but perhaps not for encoding a globin polypeptide.
Although
no examples of recruitment for expression at different developmental stages are
seen in the alpha-like globin gene clusters, some genes have lost their
function during evolution. In
particular, based on upstream sequence matches, the human ya1-globin pseudogene
appears to be orthologous to an active alpha-globin gene in goats and horse
(Fig. 5). The inactivation of ya1-gene
is accompanied by the loss of a CpG island that encompasses its homologs (Bird
et al. 1987). The orthologous
relationships in Fig. 5 indicate that the a2- and a1-globin genes in humans
result from a duplication only in primates, i.e. separate from the duplication
proposed to generate the pair of alpha-globin genes in the ancestral eutherian
mammal. The more recent
duplication in primates has left a long region of sequence similarity
surrounding the alpha-globin genes (Hess et al. 1984), and unequal cross-overs
within that region of homology is the cause of some forms of
alpha-thalassemia. Not all mammals
have retained a pair of active alpha-globin genes. Rabbits are the exception, with only one alpha-globin gene
(Cheng et al. 1986). Curiously,
this gene cluster has expanded by block duplications of a zeta-zeta-theta gene
triad (Cheng et al. 1987), similar to the expansion of the beta-like globin
gene cluster in goats (Townes et al. 1984) (Fig. 4).
All
the vertebrate globin gene clusters examined to date encode subunits of
hemoglobins differentially expressed in embryonic and adult erythroid cells
(Fig. 3). Likewise, hemoglobin
synthesis is developmentally regulated in some invertebrates (Terwilliger 1998)
and different plant leghemoglobins are made at progressive stages of nodulation
(Hyldigamma-Nielsen et al. 1982; Lee et al. 1983). Even species as distant from human as Chlamydomonas
(Couture et al. 1994) and Paramecium (Yamauchi et al. 1995) have multiple
hemoglobin genes. Thus the ability
to express different hemoglobins at particular developmental stages, i.e.
hemoglobin switching, is very old, predating the plant-animal divergence, and
possibly being much older. In Fig.
3, multiple globin genes are shown in the ancestral gene clusters. It is likely that they were
differentially expressed during development in these ancestral species.
5.4 Differences in genomic context and
regulation of the mammalian alpha-globin and beta-globin gene clusters
The
separation of alpha-globin and beta-globin gene clusters to different
chromosomes has allowed them to diverge into strikingly different genomic
contexts, with paradoxical consequences for our understanding of their regulation. Given that all contemporary vertebrates
have developmentally regulated hemoglobin genes encoding proteins used for
oxygen transport in erythrocytes, it would have been reasonable to expect that
the molecular mechanisms of globin gene regulation would be conserved in
vertebrates. Certainly, the
coordinated and balanced expression of alpha- and beta-globin genes to produce
the heterotypic tetramer alpha2beta2 in erythrocytes should be a particularly
easy aspect of regulation to explain.
Since the two genes would have been identical after the initial
duplication in the ancestral vertebrate, with identical regulatory elements, it
is parsimonious to expect
selection to keep the regulatory elements very similar.
However,
much has changed between the alpha-like and beta-like globin gene clusters
since their duplication. Not only
are they now on separate chromosomes in birds and mammals, but in mammals they are in radically
different genomic contexts (Fig. 6).
The beta-globin gene clusters are A+T rich, with no CpG islands
(reviewed in Collins and Weissman 1984), whereas the alpha-like globin gene
clusters are highly G+C rich, with multiple CpG islands (Fischel-Ghodsian et
al. 1987). Tissue-specific gene expression
is frequently correlated with an increased accessibility of the chromatin only
in expressing cells, and hence "opening" of a chromatin domain is a
key step in activation of many tissue-specific genes. This is the case for beta-like globin genes of mammals
(Groudine et al. 1983; Forrester et al. 1990), but not the alpha-like globin
genes, which are in constitutively open chromatin (Craddock et al. 1995). Thus the mammalian alpha-globin genes
have several characteristics associated with constitutively expressed,
"housekeeping" genes.
Additionally, the alpha-globin genes are replicated early in S phase in
all cells (a time when most expressed genes are replicated), whereas
beta-globin genes are replicated early in S phase only in cells expressing them
(Calza et al. 1984; Dhar et al. 1988).
In keeping with the presence of CpG islands, the alpha-globin gene
cluster is not methylated in any cell types (Bird et al. 1987), whereas the
beta-globin gene cluster is subject to tissue-specific DNA methylation (van der
Ploeg and Flavell 1980). Thus the
strikingly different genomic contexts of the two gene clusters affect several
aspects of DNA and chromatin metabolism, including timing of replication,
extent of methylation, and the type of chromatin into which the loci are
packaged. Rather than selecting
for similarities to insure coordinate and balanced expression, the processes of
evolution at these two loci have made them quite different.

Fig. 6. Differences in chromatin structure
between alpha- and beta-globin gene clusters of humans. Globin genes and distal
control regions are shown as filled boxes. HS-40 is located within an intron (white box) of the -14
gene (exons of this gene are shown as black boxes), located upstream of the
zeta-globin gene and transcribed in the opposite orientation. Developmentally stable DNase
hypersensitive sites (HSs) are shown as filled arrows, and those that occur at
specific developmental stages (when the associated gene is expressed) are shown
as white triangles. CpG islands
are shown as boxes with horizontal lines.
None are in the beta-globin gene.
HBB=
beta-globin gene, HBA
= alpha-globin gene. References
are in the text. a=alpha, b=beta, z=zeta, e=epsilon, g=gamma, d=delta, r=rho,
q=theta
The
alpha and beta-globin gene clusters also have important differences in their cis-regulatory elements. Both have distal control elements,
called the locus control region (or LCR) for beta-globin genes (reviewed in
Grosveld et al. 1993; Hardison et al. 1997b) and HS-40 for the alpha-globin genes
(Higgs et al. 1990), that are required for higeta-level expression of the
respective globin genes in transgenic mice, independently of the position of
chromosomal integration. However,
they differ in the range of functions associated with them. The beta-globin LCR
is required for tissue-specific chromosomal domain opening (Forrester et al.
1987; Forrester et al. 1990), whereas no such function has been implicated for
HS-40, as expected for a regulator in constitutively open chromatin. The distal regulatory elements also
differ dramatically in size, with the beta-globin LCR containing 17 kb with 5
DNase hypersensitive sites in chromatin (Tuan et al. 1985; Forrester et al.
1987; Grosveld et al. 1987; Dhar et al. 1990), compared to about 0.4 kb and a
single DNase hypersensitive sites in chromatin for the alpha-globin HS-40
(Jarman et al. 1991).
Indeed,
the alpha-globin HS-40 is most similar to a single hypersensitive site, HS2,
from the beta-globin LCR. Both
will confer inducible, high level expression on reporter genes in transfected
cells (e.g. Tuan et al. 1985; Ney et al. 1990; Pondel et al. 1992; Ren et al.
1993), in addition to their effects in transgenic mice (Fraser et al. 1990;
Higgs et al. 1990; Morley et al. 1992).
As illustrated in Fig. 7, these two enhancers share binding sites for
some, but not all, transcription factors (e.g. Talbot et al. 1990; Jarman et
al. 1991; Strauss et al. 1992; Reddy et al. 1994). Both contain maf-response elements, or
MAREs (Motohashi et al. 1997), to which transcriptional activator proteins of
the basic leucine zipper class can bind.
A particular subfamily of proteins related to AP1, such as NFE2,
LCRF1/Nrf1, and Bach1, bind to this site (reviewed in Orkin 1995; Baron 1997). All are heterodimers containing a Maf
protein as one subunit, which is the basis for the name. Other binding sites in common are GATA,
to which GATA1 and its relatives bind (Evans et al. 1990), and the CACC motif,
to which a family of Zn-finger proteins including EKLF can bind (Miller and
Bieker 1993) . These binding sites
are occupied in vivo
(Strauss et al. 1992; Reddy et al. 1994), and all have been shown to contribute
to the function of the enhancers (Strauss et al. 1992; Caterina et al. 1994;
Reddy et al. 1994; Rombel et al. 1995).
Other functional sites, such as the E boxes in HS2 (Lam and Bresnick
1996; Elnitski et al. 1997), are not found in common. Each of these binding sites is conserved in homologous
regulatory elements in mammals (reviewed in Gourdon et al. 1995; Hardison et
al. 1997b). In general, binding
sites for many of the proteins implicated in activation of globin gene
expression are present in both HS-40 and HS2 of the beta-globin LCR, but their
number and arrangement differs in the two enhancers (Fig. 7). It is currently not possible to assess whether
this limited similarity occurs via divergence from a common ancestral
regulatory element or by convergence from different ancestral elements.

Fig. 7. Conserved motifs in distal regulatory
elements. Similar protein binding
sites have the same fill, and proteins implicated in acting at a given site are
listed below that motif in the beta-LCR HS2 line; the same proteins have been
implicated at similar motifs in the other regulatory elements. Boxes without labels are conserved
sequences of untested function.
The
proximal regulatory elements also differ in important ways between alpha-globin
and beta-globin genes (Fig. 8).
The promoters do contain two binding sites in common - the TATA motif,
to which the general transcription factor TFIID binds, and the CCAAT motif, to
which several families of trans-activators,
such as CP1, can bind
(Efstratiadis et al. 1980).
However, the other protein binding sites are completely different
between the human alpha and beta-globin genes (deBoer et al. 1988; Rombel et
al. 1995). In addition, the CpG
island encompassing the 5' flanking region and much of the gene is a key
component of the cis-regulatory
elements for the alpha-globin gene of rabbits and humans, possibly through its
effects on chromatin structure (Pondel et al. 1995; Shewchuk and Hardison
1997); again, no CpG island is
found at any of the beta-like globin genes.

Fig. 8. Conserved features of globin gene
promoters. Binding sites for the
human globin genes are shown;
similar protein binding sites have the same fill. Those sites conserved in other mammals
have a dark outline, those not conserved have a grey outline. The figure is not drawn to scale, but relative
positions are indicated; the genes
themselves are truncated. The
chicken beta-globin gene promoter is shown for comparison; no information about evolutionary
conservation is presented for this gene.
This figure summarizes work from a large number of papers (e.g.
Efstratiadis et al. 1980; Lacy and Maniatis 1980; Hardison 1983; Antoniou et
al. 1988; Barnhart et al. 1988; deBoer et al. 1988; Wall et al. 1988; Martin et
al. 1989; Puruker et al. 1990; Stuve and Myers 1990; Macleod and Plumb 1991;
McDonagh et al. 1991; Yu et al. 1991; Gong and Dean 1993; Motamed et al. 1993;
Peters et al. 1993; Yost et al. 1993; Lloyd et al. 1994; Rombel et al. 1995). a=alpha,
b=beta, z=zeta, e=epsilon, g=gamma, d=delta, r=rho, q=theta.
The
differences in genomic context between alpha-globin and beta-globin genes have
been seen for non-human mammals as well (Bernardi et al. 1985), indicating an
origin prior to the divergence of eutherian mammals. The high G+C content and presence of CpG islands is
characteristic of alpha-globin gene clusters from goats (Wernke and Lingrel
1986), horse (Flint et al. 1988) and rabbit (Hardison et al. 1991), whereas the
beta-globin gene clusters from non-human mammals are rich in A+T (e.g. Margot
et al. 1989; Shehee et al. 1989).
The only apparent exception is the mouse alpha-globin gene cluster,
which to date has not been completely characterized. However, the sequenced mouse alpha-globin gene is not in a
CpG island (Nishioka and Leder 1979).
Indeed, the mouse genome shows a general depletion in CpG islands
(Antequera 1993; Matsuo et al. 1993).
The loss of the CpG island is correlated with a chromosomal
rearrangement that moved the alpha-globin locus from the terminus of a
chromosome, where it is in human (Flint et al. 1997) and rabbit (Xu and
Hardison 1991), to an internal location in mouse (Tan and Whitney 1993). It is possible that position of the
alpha-globin locus close to the end of the chromosome, in a large region high
in G+C content (Flint et al. 1997), is critical to maintenance of the CpG
islands.
Despite
these many differences between alpha-globin and beta-globin gene clusters in
mammals, the appropriate genes are still expressed coordinately between the two
loci, resulting in balanced production of alphalike and beta-like globins
needed for the synthesis of normal hemoglobins. The full mechanism that accomplishes this task still eludes
our understanding.
5.5.
Origin and location of the LCR in birds and mammals
In
contrast to the differences between the alpha and beta-globin gene clusters of
mammals, tissue-specific opening of a chromatin domain occurs in beta-globin
gene clusters of both mammals and birds.
In fact, the association between accessible chromatin and gene
activation was first made for the chicken globin genes (Weintraub and Groudine
1976). Further studies of the
avian gene clusters carefully mapped the limits of the open domain in chromatin
to a region of 33 kb, extending about 10 kb on each side of the set of 4 globin
genes (Clark et al. 1993; Hebbes et al. 1994). The limits of the open domain for the human beta-globin gene
cluster have not been determined.
The open domain at the 5' end encompasses at least the LCR, which
extends 20 kb 5' to the epsilon-globin gene, and an erythroid HS has been
mapped as far as 70 kb 3' to the beta-globin gene (Elder et al. 1990),
indicating an open domain of at least 150 kb.
LCRs
have been mapped in beta-globin gene clusters both in chickens and in all
mammals examined to date. The LCRs
in mammals are homologous (Moon and Ley 1990; Li et al. 1991; Hug et al. 1992;
Jimenez et al. 1992; Hardison et al. 1993; Hardison et al. 1997b; Slightom et
al. 1997), with long segments of high sequence similarity found both inside and
outside the cores of the LCR HSs (Fig. 9). All are located 5' to the epsilon-globin gene (Fig.
4). Sequences similar to HS2 and
HS3 of the beta-globin LCR are also found in marsupials and monotremes (R. Hope,
personal communication), indicating that the LCR is predates the divergence of
placental and nonplacental mammals, about 173 Myr ago (Kumar and Hedges
1998). In contrast, the major LCR
activity in chickens maps to an enhancer located between the betaA and
epsilon-globin genes (Reitman et al. 1990). Four additional HSs are 5' to the rho-globin gene, and
together they produce a modest enhancement in expression (Reitman et al. 1995). Despite their analogous location at the
5' ends of the gene cluster, they do not have the pronounced effects associated
with the HSs in the mammalian LCR.
In fact, comparison of the DNA sequences of the beta-globin gene
clusters between chicken and human fails to reveal any statistically
significant alignments outside the coding regions of some of the exons (Reitman
et al. 1993; Hardison 1998). Thus
no clear homologies are present in either the distal regulatory elements (LCRs
and enhancers) or in the promoters.