7. Gene Regulation – Essentials of Molecular Biology


Gene Regulation

  • Introduction
  • Regulation of Gene Expression in Prokaryotes
  • Induction and Repression
  • Types of Operons
  • Lac Operon
  • Regulation of Lactose Operon
    • Negative regulation of lactose operon


  • Catabolite Repression
  • Positive Regulation of Lac Operon
    • Positive regulation by cAMP
    • Gratuitous inducers
  • Tryptophan Operon and Attenuation
    • Structural genes
    • Regulatory gene
    • Operator
    • Co-repressor
    • Attenuation
  • Regulation of Eukaryotic Gene Expression
  • Regulation of the Chromatin Structure
    • Chromatin remodelling
    • Nucleosome-remodelling factors
    • Histone modifications
    • DNA methylation
  • Regulation of Gene Expression at Transcription Level
    • Cis-acting regulatory sequences: promoters and enhancers
    • Transcriptional regulatory proteins
    • Eukaryotic repressors
    • RNA interference
    • MicroRNAs (miRNAs)
  • The DNA-Binding Domains of the Regulatory Proteins Involved in Transcriptional Control
    • Helix-turn-helix motif
    • Zinc-finger motif
    • Homeodomain
  • Regulatory Proteins also have Protein–Protein Interaction Domain
    • Leucine zipper
    • HLH domain
  • Regulation of Gene Expression by Hormone Action
    • At the level of post-transcriptional modification
    • By controlling the stability of mRNA molecule
    • At translational level
    • At post-translational level
    • By hormone action
  • Summary
  • References

Prokaryotes live in a diversified environment ranging from the human intestine to polluted ponds, rivers, etc. and are thus exposed to different metabolites and molecules. Prokaryotes can survive in such a wide range of ecological conditions because of their ability to ‘switch on’ and ‘switch off’ the expression of specific sets of genes in response to the specific demand of the environment. Prokaryotes regulate the expression of the genes based on their cellular requirements.

Multicellular eukaryotes exhibit differential regulation of gene expression. For example, the B-cells of the immune system produce antibodies, while the pancreatic cells produce insulin. Thus, eukaryotes express a subset of genes in one cell and a different subset in another cell type. Several molecular mechanisms activate the specific portions of the genome and repress the expression of other genes. The inappropriate expression of wrong genes in the wrong place, at the wrong time, in the wrong cell type may lead to deleterious consequences and even death. Thus, regulating the gene expression is of prime importance for the cell and consequently for the organism.


Some gene products are required at all time and their genes are expressed at more or less constant level in all cells of a species or organism. The genes for the enzymes that catalyse the steps in a central metabolic pathway such as citric acid cycle fall in this category. These genes are called housekeeping genes and they are expressed at all time. This is called ‘constitutive gene expression’. The amount of gene products rises and falls in response to molecular signals.

Gene expression can be controlled at any of one of the stages, namely transcription, processing and translation. Transcription often is controlled at the stage of initiation. Transcription is not usually controlled at elongation, but may be controlled at the stage of termination. A prokaryotic cell could shut off the supply of unwanted protein at the transcription level by blocking the mRNA synthesis or transcriptional control mediated by repressor proteins or hydrolyse the mRNA once it is formed (post-transcriptional control). In bacteria, an mRNA is typically available for translation while it is being synthesized; this is called coupled transcription/translation. Translation in bacteria is regulated but passively. The gene activity is regulated by the specific interactions of the ‘trans-acting products’ (proteins) with sites on the DNA (‘cis-acting DNA sequences’).

The sequences that mark the beginning and end of the transcription unit—the promoter and the terminator are the examples of cis-acting sequences. A promoter serves to initiate transcription only of the genes that are present on the same stretch of DNA. Likewise, a terminator can terminate transcription only by an RNA polymerase that has traversed the preceding genes. That is both promoter and terminator are both cis-acting sequences that are recognized by the same trans-acting protein, i.e., RNA polymerase.

The bacterial genome is organized so that genes with a common function, such as the synthesis of enzymes involved in the biosynthetic or degradative pathway are often grouped together and are regulated together in units that are called as ‘operons’. The genes of the operon are transcribed into polycistronic mRNA that can be activated or repressed. Operons mainly comprise two main regions; namely, the ‘structural genes’ and the ‘regulatory genes’. A structural gene is simply any gene that codes for a protein or an RNA product. The structural genes include the genes of proteins that perform enormous functions. A regulatory gene codes for a protein that controls gene expression by binding to particular DNA sequences. Operons consist of several structural genes, a promoter, a terminator, a control site (operator) and a separate regulator gene (codes for protein that binds to operator).


Some substances called ‘inducers’ are capable of inducing or activating the expression of genes. In the presence of such inducers, the genes are transcribed and translated to synthesize the enzymes, which then metabolize the inducer. This phenomenon is called ‘induction’ and the enzymes synthesized are called ‘inducible enzymes’. For example, when E. coli is grown in the presence of lactose as a carbon source, the lactose acts as inducer and activates the genes to synthesize the enzyme β-galactosidase (β-gal), which metabolizes the lactose to glucose and galactose.

On the other hand, certain substances are capable of reducing the expression of structural genes; such substances are called ‘repressors’ and the process of reducing the gene expression mediated by a repressor is called ‘enzyme repression’. Those metabolites whose introduction into a growth medium specifically decreases the amount of specific enzymes are called ‘co-repressors’. For example, the presence of large amounts of amino acids in the medium leads to the repression of the genes for the enzymes catalysing the biosynthesis of that amino acid in the bacteria.


There are four different patterns of gene regulations namely negatively inducible, negatively repressible, positively inducible and positively repressible gene regulations (Figure 7.1).

  1. ‘Negatively controlled inducible operons’ are not normally transcribed because a specific repressor protein that is bound to the operator region of the operon. Induction occurs when an inducer, a small molecule, binds to the repressor and alters its conformation, so that the repressor, with the altered conformation, can no longer bind to the operator and allow the transcription of the structural genes to proceed.
  2. ‘Negatively controlled repressible operons’ are normally transcribed, but when co-repressor binds to apo repressor (protein), this complex binds to the operator, which results in the inhibition of transcription.
  3. ‘Positively controlled inducible operons’ become active when a co-activator, a small molecule, binds to an apo activator, which alters its conformation, so that the complex can bind to a site near the operon and allow the transcription to be initiated.
  4. ‘Positively controlled repressible operons’ are the activator proteins that are normally bound to the operator. However, when a corepressor is bound by the activator, it is prevented from binding the DNA. This stops activation and transcription of the system (Table 7.1).


Figure 7.1 Types of gene regulation


Table 7.1 Types of operon



Francis Jacob and Jacques Monod described the operon model in 1961, based on the regulation of lactose metabolism by the intestinal bacterium E. coli. Lactose operon is an ‘inducible operon’, because lactose induces the transcription of the operon. The lac operon is normally off, but when a molecule called an ‘inducer’ is present, the operon turns on. The lac operon consists of one regulatory gene (the i gene) and three structural genes (z, y and a). The i gene codes for the repressor of the lac operon.

The lactose operon


The repressor

The repressor is a tetramer of identical subunits of 38 kD each. A wild-type E. coli cell contains approximately 10 tetramers. The repressor gene is an unregulated gene. It is transcribed into a monocistronic mRNA. The repressor functions by binding to an operator.

The operator

The operator lies between the promoter and the structural genes. The sequence of the operator includes an inverted repeat. When the repressor binds to the operator, it prevents RNA polymerase from initiating transcription at the promoter. The operator extends from −5 upstream to +21 within the transcription unit. Thus, it overlaps the 3′ right end of the promoter. Operator mutation results in the constitutive expression of the lac operon.

The structural genes

The Z gene codes for β-galactosidase, which is primarily responsible for the hydrolysis of the disaccharide, lactose into its monomeric units, galactose and glucose (Figure 7.2). The Y gene codes for permease, which increases the permeability of the cell to β-galactosides. The A gene codes for transacetylase, which participates in detoxification. During normal growth on a glucose-based medium, the lac repressor is bound to the operator region of the lac operon, preventing transcription.

The lac mRNA is extremely unstable and decays with a half-life of only approximately three minutes. This allows induction to be reversed rapidly by repressing transcription as soon as the inducer is removed.


Figure 7.2 Action of β-galactosidase



Negative Regulation of Lactose Operon

A repressor protein encoded by the i gene prevents the structural genes from being expressed. Close to the promoter is another cis-acting site called operator. The repressor binds to the operator and prevents RNA polymerase from initiating transcription (Figure 7.3). The gene expression is, therefore, turned off. The expression of lac i gene is constitutive, i.e., it is always expressed. The lac repressor protein is an allosteric protein. Negative regulation takes place in the absence of lactose and the repressor protein is strongly bound to the operator.

In the above figure, L denotes lactose/allolactose.


Figure 7.3 Negative regulation of Lac operon



When E. coli cells are exposed to both glucose and lactose as carbon source, the bacterium will metabolize glucose. Although lactose is present from the beginning of the bacterial growth phase, lac enzymes are not produced until glucose in the medium is exhausted. This repression of lac operon by glucose is termed as ‘catabolite repression’ and is mediated by cAMP receptor protein (CRP), which is also called catabolite gene activator protein (CAP) along with cAMP.

RNA polymerase has less affinity for the promoter of the lac operon unless helped by the regulatory protein, CRP. CRP becomes activated only if the concentration of cyclic AMP (cAMP) is high. Glucose inhibits the formation of cAMP. Therefore, if the concentration of glucose is high, the concentration of cAMP is low and hence RNA polymerase will not be able to bind well to the promoter and the operon will be operating at a very low level (i.e., almost switched off) (Figure 7.4b).


Figure 7.4 Regulation of Lac operon by cAMP (a) Positive regulation by cAMP (b) Catabolite repression



When lactose is present in the medium, small amounts of lactose enters inside the bacterial cell via the basal level of permease expression and is hydrolysed to the lactose isomer called allolactose, which acts as an inducer. Allolactose binds to the repressor protein in the inducer-binding site. This binding causes conformational change in the repressor protein and consequently repressor loses its hold on the operator site. RNA polymerase binds to the promoter sequences and initiates the transcription of the structural genes. The mRNA is translated to produce the lac operon enzymes, which permits the entry of more lactose into the cell and converts it into glucose and galactose. Thus, in the presence of an inducer, the lac operon is in a de-repressed state (Figure 7.5).

Positive Regulation by cAMP

If the concentrations of glucose is low and lactose is high, the concentration of cAMP will be high, CRP also called CAP will be activated and bind to the DNA, which will promote RNA polymerase binding and initiate transcription.

CRP is a dimer of two identical subunits of 22.5 kD, which can be activated by a single molecule of cAMP. A CRP monomer contains a DNA-binding domain and a transcription-activation domain. A CRP dimer binds to a site of approximately 22 base pairs (bp) at the promoter region. CRP induces large bend when it binds DNA. This possibly allows CRP to contact RNA polymerase at the promoter and allows positive regulation.


Figure 7.5 Positive regulation of Lac operon


Gratuitous Inducers

Lactose is rarely used in experiments to study the expression of lac genes, because the β-galactosidase produced will act on lactose and reduce the level of lactose that may complicate the analysis of the lac operon. For this purpose, two sulphur-containing analogues of lactose, namely isopropylthiogaloside (IPTG) and thiomethyl galactoside are used, which are effective inducers without being the substrate for β-galactosidase. These are called ‘gratuitous inducers’. That is these are substances that resemble natural inducers but cannot be metabolized by the enzymes.




The tryptophan operon is required for the synthesis of the amino acid tryptophan. The trp operon is an example of a repressible operon—it is normally on; however, when a molecule called a ‘repressor’ is present, the operon turns off.

Structural Genes

The tryptophan operon contains five structural genes, namely trpE, trpD, trpC, trpB and trpA, that code for the enzymes involved in the synthesis of tryptophan. These genes are transcribed from a common promoter into a polycistronic mRNA, which is translated to yield the enzymes namely anthraanilate synthetase, indole-3-glycerol phosphate synthetase and tryptophan synthetase.

Regulatory Gene

The expression of the structural genes is controlled by a specific regulatory gene. The regulatory gene codes for a specific protein product called a repressor (sometimes called an apo-repressor). When the repressor is synthesized, it is inactive. However, it can be activated by complexing with the co-repressor (i.e., tryptophan). Thus, the trp repressor protein acts to reduce its own synthesis, i.e., it is ‘auto regulated’. The repressor is made as an inactive negative regulator. The amino acid tryptophan is co-repressor. When the levels of the amino acid tryptophan are high, two molecules bind to the dimeric trp repressor, changing its conformation to the active DNA-binding conformation and the repressor thus binds to the operator. This prevents RNA polymerase binding to the overlapping promoter sequence. Up to three trp repressor dimers can bind to the operator.


The active repressor/co-repressor complex acts by binding to a specific region of the DNA called the operator that is adjacent to the structural genes being regulated. Thus, in the presence of the co-repressor, the repressor is active and binds to the operator, resulting in the repression of the transcription of the structural genes. In contrast, in the absence of co-repressor, the repressor is inactive and does not bind to the operator, resulting in the transcription of the structural genes.


The transcription of the tryptophan genes is influenced by the presence or the absence of a co-repressor (tryptophan).

The regulatory sites lie next to trpE and consist of a promoter site trpP, an overlapping operator site, trpO and a leader region trpL that codes for a leader peptide.

The enzymes that synthesize tryptophan are produced only when tryptophan is not available in the bacterial cell’s environment (medium).



trpE and trpD—They code for anthranilate synthetase.

trpC—It codes for indole-glycerol synthetase.

trpB and trpA—They code for tryptophan synthetase B chain and tryptophan synthetase A chain, respectively.

A complex system of repression and attenuation is used in the E. coli trp operon. The trp operon is a repressible operon with three transcription units. The first level of the control of gene expression is that the operon is negatively repressible that it is prevented from initiating transcription by its product, namely tryptophan. Attenuation is the second level of control, where the mRNA synthesis stops at a point about 140 nucleotides along the transcript. There is a region in the 5′ leader of the mRNA called the attenuator. An attenuator is an intrinsic terminator that is located between the promoter and the structural genes. Transcription is controlled by the rate of translation of the attenuator. High levels of tryptophan tRNA will attenuate or terminate transcription; on the other hand, low levels of tryptophan allow the trpEDCBA operon to be transcribed. This is made possible because of the changes in the secondary structure of the attenuator RNA, which are determined by the position of the ribosome on mRNA. The trpL gene codes for the ‘leader peptide’, which comprises 14 amino acids. The leader peptide comprises a ribosome-binding site whose AUG codon is followed by a short coding region that contains two successive codons for tryptophan.

When the cell has low levels of tryptophan or in the absence of tryptophan, the repressor is inactive and does not bind to the operator. Transcription is initiated and is not terminated. Therefore, the structural gens are transcribed. As the result, a polycistronic mRNA is produced and this is translated to give the enzymes involved in tryptophan biosynthesis. Further, in the absence of trp tRNA, the ribosome stalls at the tryptophan codons and an alternative secondary structure formation prevents the formation of terminating hairpin structure and hence transcription continues (Figure 7.6 (a)).

On the other hand, when a cell has high levels of tryptophan, two molecules bind to the dimeric trp repressor, changing its conformation to the active DNA-binding conformation and the repressor thus binds to the operator. This prevents RNA polymerase binding to the overlapping promoter sequence inhibiting transcription. Further, in the presence of trp tRNA, the leader peptide is translated and the attenuator is able to form the hairpin that causes termination (Figure 7.6 (b)).


Figure 7.6 The tryptophan operon (a) Tryptophan absent, repressor inactive and operon on (b) Tryptophan present, repressor active and operon off



The trp operon attenuation mechanism uses signals encoded in four sequences namely sequences 1, 2, 3 and 4, within the leader region. The attenuator that is present at the end of the leader sequence is made up of sequences 3 and 4, which can base pair to form a G-C-rich stem and loop structure followed by a series of uridylate residues that resembles a transcription terminator. Transcription will halt here when this structure is formed (Figure 7.7).

The translation of the leader peptide begins immediately after it is transcribed and the bound ribosome follows closely behind the RNA polymerase as transcription proceeds.

When tryptophan levels are high, the concentration of charged trp tRNA is also high. Translation follows, the ribosome quickly translates sequence 1 and blocks sequence 2, before sequence 3 is transcribed by RNA polymerase. In this case, the sequence 2 is covered by the ribosome and this is unavailable for pairing with the sequence 3 when it is synthesized. The transcribed sequences 3 and 4 base pair, forming the terminator structure that halts the transcription of the structural genes.

When tryptophan levels are low, the ribosome pauses at the trp codons in sequence 1. Sequences 2 and 3 base pair with each other and this prevents the base pairing of the sequences 3 and 4 and consequently prevents attenuation. Thus, sequence 2 is an alternative complement for sequence 3. If sequences 2 and 3 base pair, the attenuator structure derived from the interaction of sequences 3 and 4 cannot form and transcription continues into the tryptophan biosynthetic genes.

This is an example of a ‘riboswitch’, a mechanism that can control transcription and translation through the interactions of molecules with an mRNA.


Figure 7.7 Attenuation and anti termination of tryptophan operon



The molecular mechanisms that operate to regulate gene expression in eukaryotes vary greatly from that in prokaryotes. This is basically because of the following reasons:

  • The eukaryotic genome is highly complex and more condensed than the prokaryotes.
  • Eukaryotes have more chromosomes than the prokaryotic systems.
  • Eukaryotes exhibit more cellular compartmentalization than the prokaryotes.
  • Eukaryotic mRNAs have longer half-life than prokaryotic mRNAs.
  • A large portion of eukaryotic genome is non-coding.

There are six control points of gene regulation in eukaryotes. They are:

  1. Regulation at transcription level,
  2. Regulation at post-transcriptional modification level,
  3. Regulation of the nuclear export of mRNA,
  4. Stability of the mRNA,
  5. Regulation at translational level and
  6. Regulation of the chromatin structure.

Chromosomal regions that are activated for transcription are marked by a variety of structural changes. The packaging of eukaryotic DNA into chromatin limits its availability as a template for transcription. Thus, modifying chromatin structure plays a key role in controlling the gene expression in eukaryotes. The transcription of eukaryotic gene is strongly repressed when it is condensed as chromatin. The presence of histone protein and nucleosome complexes limits the availability of the DNA sequences for the binding of transcription factors and RNA polymerase. Therefore, the decondensation of the chromatin and the disruption of the nucleosome structure is the key control point of gene regulation. A series of transcription-associated changes takes place in the chromatin and this is referred to as ‘chromatin remodelling’. Once established, such changes in chromatin can persist through cell divisions, creating an epigenetic state in which properties of the gene are determined by the self-perpetuating structure of the chromatin.

The DNA of all eukaryotic cells is tightly bound to histone proteins. The basic structural unit of chromatin is the nucleosome, which consists of 146 bp of DNA wrapped around two molecules each of histone H2A, H2B, H3 and H4, with one molecule of histone H1 bound to the DNA, as it enters the nucleosome core particle. The chromatin is then condensed by being coiled into higher-order structures (refer Chapter 2).

Chromatin Remodelling

Chromatin remodelling accompanying transcription initiation involves the modifications of histones, the rearrangements of nucleosomes and the association of various non-histone proteins such as HMGN proteins (high mobility group nucleosome-binding protein). The binding site of HMGN proteins on the nucleosome overlaps the binding site of histone H1 (Figure 7.8).

HMGN proteins, therefore, compete with H1 for shared binding sites on the nucleosome, decrease the H1 residence time at selected chromatin loci and promote the unfolding of the chromatin fibre.

Chromatin remodelling is performed by ATP-dependent chromatin-remodelling complexes, which use the energy of ATP hydrolysis for remodelling. The core region of the chromatin-remodelling complex is its ATPase subunit. The remodelling complexes are classified according to the subfamilies of ATPase that they contain as their catalytic sub-unit. There are many sub-families, the four major ones are SWI/SNF(switch sniff complex), ISWI, CHD and INO80/SWR1. SWI/SNF can remodel in vitro without the loss of histones or can displace histone octamers. The structure of the target nucleosome is altered leading to a remodelled nucleosome on the original DNA or may displace the histone octamer to different position on the DNA. The SWI/SNF complexes generally are involved in transcription activation.

The ISWI family primarily affects the nucleosome positioning without displacing octamers, in a sliding reaction in which the octamers moves along the DNA. The ISWI complexes act as repressors of transcription.


Figure 7.8 Chromatin remodelling by HMGN proteins


Nucleosome-Remodelling Factors

These are protein complexes that alter the arrangement of nucleosomes, without removing or covalently modifying histones. One mechanism by which they act is to catalyse the sliding of histone octamers along the DNA molecule, thereby repositioning nuceleosomes to change the accessibility of specific DNA sequences to interact with transcriptional regulatory proteins.

Histone Modifications

All of the core histones are subject to multiple covalent modifications. Different modifications result in different functional outcomes. The modifications include acetylation, methylation and phosphorylation.

Histone acetylation

This is often correlated with transcriptionally active gene in a wide variety of cell types. The core histones H2A, H2B, H3 and H4 have two domains: a ‘histone fold domain’, which is involved in interactions with other histones and in wrapping DNA around the nucleosome core particle and an ‘amino terminal histone tail domain’, which extends outside of the nucleosome. Amino terminal histone tail domain is rich in lysine residues. All core histones are acetylated on lysine residues. Acetylation reduces the net positive charge of the histones and may weaken their binding to DNA as well as alters their interaction with other proteins. In addition, the acetylation of histones facilitates the binding of transcription factors to nucleosomal DNA. Thus, histone acetylation increases the accessibility of chromatin to DNA-binding proteins. The enzymes involved for the acetylation of lysine residues are ‘lysine (K) acetyltransferases or KATs’; when they specifically acetylate the lysine residues of histone, they are referred to as histone acetyl transferase (HAT). There are two classes of HATs namely group A and group B. Group A acts on histones in chromatin and is involved in transcription control. While group B acts on newly synthesized histones in the cytosol. The acetylation reactions are reversible and the acetyl groups are removed by histone deacetylases (HDACs). The deactylation of histones is linked with transcription repression. The absence of histone acetylation is a feature of heterochromatin. Active chromatin is acetylated on tails of histone H3 and H4. Inactive chromatin is methylated on a specific lysine of histone H3 (Figure 7.9).

Histone methylation

Histone methylation can be linked with either active or inactive regions depending on the specific sites of methylation. Histones H3 and H4 are methylated at lysine residues in the tail. In addition, three arginines in H3 and H4 are also methylated. Trimethylated H3K4 occurs at the transcriptional start site of active genes. H3 methylated at K9 or K27 is a characteristic feature of transcriptionally silent genes. Histone methylations are characterized by lysine methyltransferases (KMTs or HMTs). Methylation is also reversible and can be demethylated by lysine demethylases.

Histone phosphorylation

All histones can be phosphorylated in different contexts inside the cell. Histone phosphorylation is commonly seen in three different circumstances.

  1. During the cell cycle,
  2. Chromatin remodelling during transcription and
  3. During DNA repair.


Figure 7.9 Histone acetylation


Histone H1 is phosphorylated at mitosis. The phosphorylation of serine 10 of histone H3 is linked to transcriptional activation (where it promotes the acetylation of K14 in the same tail), as well as to chromosome condensation and to mitotic progression. It is to be noted that histone phosphorylation during cell division favours chromatin condensation; however, during transcription and DNA repair, it favours chromatin decondesation (Figure 7.10).

Histone-modifying enzymes, in particular HATs and HDACs, have been shown to be involved in the generation of cancer and other diseases such as Rubinstein–Taybi syndrome (RTS), a mental disorder accompanied by skeletal abnormalities, acute myeloid leukaemia (AML) and certain gastric and colorectal cancers. The amplification and overexpression of another HAT correlates with breast cancer. HDACs are also implicated in cancer, for example, promyelotic leukaemias.


Figure 7.10 Histone modifications modulate chromatin structure


DNA Methylation

The methylation of DNA is another mechanism by which the control of transcription is linked to chromatin structure. Cytosine residues in vertebrate DNA can be modified by the addition of methyl groups at the fifth carbon position. DNA is methylated at the Cs that precedes the Gs in the DNA chain (CpG dinucleotide). This methylation is correlated with the reduced transcriptional activity of genes. High frequencies of C, G nucleotides near the promoter sequences reduce gene expression. Methylation inhibits the transcription of these genes by interfering with the binding of some transcriptional activators, as well as recruiting repressors that specifically bind methylated DNA. The methylation pattern is heritable after cell division. Therefore, DNA methylation plays an important role in cell differentiation during development. ‘Epigenetics’ is the study of heritable changes in chromatin (e.g., DNA methylation) without involving the change in DNA sequences.




Various molecular mechanisms operate to control gene expression at the level of transcription.

cis-Acting Regulatory Sequences: Promoters and Enhancers

These are the sequences that control the transcription of adjacent genes. Genes transcribed by RNA polymerase II have core promoter elements including the TATA box and Inr sequences. These cis-acting sequences serve as the binding sites of various transcription factors. Other cis-acting sequences serve as the binding sites of various regulatory factors that control the expression of individual genes. The cis-acting sequences are frequently located upstream of the TATA box; for example, consensus sequences such as CCAAT and GGGCGG (GC box).

In addition to the sequences mentioned above, many genes in mammalian cells are controlled by the regulatory sequences located farther away from the transcription start site. These sequences are called enhancers. The activity of the enhancers depends neither on their distance nor on their distance with respect to the transcription start site. They could stimulate the transcription when placed either upstream or downstream of the promoter, in either forward or backward orientation. Enhancers such as promoters function by binding transcription factors that regulate RNA polymerase. This is possible because of DNA looping, which allows a transcription factor bound to a distant enhancer to interact with proteins associated with RNA polymerase at the promoter.

The binding of specific transcriptional regulatory proteins to the enhancers is responsible for the control of gene expression during development and differentiation as well as during the response of cells to hormones and growth factors. An important feature of enhancers is that they usually contain multiple functional sequence elements that bind different transcriptional regulatory proteins. These proteins work together to regulate gene expression. The immunoglobulin heavy chain enhancer, for example, spans approximately 200 bp and contains at least nine distinct sequence elements that serve as protein-binding sites.

Though enhancers can act from considerable distance from theirs promoters, the activity of any given enhancer is specific for the promoter of its appropriate target gene. This activation is limited by boundaries called ‘insulators’ or ‘boundary elements’. Insulators define transcriptionally independent domains. They are also required to prevent the heterochromatin at centromeres and telomeres from spreading into euchromatin. That is insulators function to prevent the chromatin structure of one domain from spreading to its neighbours, thereby maintaining independently regulated regions of the genome.

Transcriptional Regulatory Proteins

A variety of regulatory proteins bind to promoter or enhancer sequences and regulate the gene expression.

Examples of transcription factors and their DNA-binding sites (Table 7.2).


Table 7.2 Transcription factors and their DNA-binding domains

Transcription factors DNA-binding sites
Specificity protein 1 (SP1) GGGCGG
CCAAT/enhancer-binding protein (C/EBP) CCAAT
Activator protein 1 (AP1) TGACTCA
Octamer-binding protein (OCT-1 and OCT-2) ATGCAAAT

Eukaryotic Repressors

These bind to specific DNA sequences and inhibit transcription. In some cases, they interfere with the binding of other transcription factors to DNA. For example, the binding of the repressor near the transcription start site can block the interaction of RNA polymerase or general transcription factors with the promoter, which is similar to the action of repressors in bacteria. Some repressors compete with activators for binding to specific regulatory sequences. Certain repressors contain the same DNA-binding domain as that of the activator but lack its activation domain. Therefore, their binding to a promoter or enhancer blocks the binding of the activator, thereby inhibiting transcription initiation (Figure 7.11).

The functional targets of repressors are also diverse. Repressors act by interacting with specific activator proteins, with mediator proteins, with transcription factors and with ‘co-repressors’. One important role of repressors is to bring about the tissue-specific expression of genes in appropriate cell types; for example, repressor-binding site in the immunoglobulin enhancer is thought to contribute to its tissue-specific expression by suppressing transcription in non-lymphoid cell types. Other repressors play important roles in the control of cell proliferation and differentiation in response to hormones and growth factors.

RNA Interference

Short RNAs that have complementary sequence to the mRNA can silence gene expression. A complex of a double-stranded RNA is cleaved into short fragments of 21–22 bp in length by the ribonuclease.


Figure 7.11 Different modes of transcription repression in eukaryotes by repressors (a) Competitive binding with activator (b) Interaction with activation domain of bound activator (c) Interaction with general transcription factors


These fragments are called short interfering RNAs (siRNAs). The siRNAs bind to the RNA-induced silencing complex (RISC). One of the strands of siRNAs is degraded. The remaining single-stranded siRNA that is complexed with the RISC can then bind to complementary mRNA and the paired mRNA is cleaved. Further, the RISC–siRNA complex can enter the nucleus, bind to the genomic sequence and initiate a DNA methylation-based chromatin condensation and thus cause the inactivation of the gene (Figure 7.12).

microRNAs (miRNAs)

These are gene products that are 21–22 nucleotides in length. The primary miRNAs transcribed form hairpin structures. They are cleaved to make precursor miRNAs (roughly 70 nucleotides in length). They are then exported to the cytoplasm where they are further cleaved by enzymes into the 21–22 nucleotide mature miRNAs. The miRNAs form ribonucleoprotein complexes with mRNAs. If the match is exact, the mRNA is destroyed, similar to siRNA mechanisms (Figure 7.13).


Figure 7.12 Gene regulation by siRNAs



Regulatory proteins generally bind to specific DNA sequences. Their affinity for these target sequences is approximately 104–106 times higher than their affinity for many other DNA sequences. Most regulatory proteins have discrete DNA-binding domains containing substructures that interact closely and specifically with the DNA.


Figure 7.13 Gene regulation by miRNAs


Within the regulatory proteins, the amino acid side chains form hydrogen bonds to bases in the DNA. The amino acids mostly involved in such interactions include Asn, Gln, Lys and Arg. These mostly interact with the major groove of the DNA. This interaction requires a relatively small structure that can protrude from the protein surface. The DNA-binding domains of regulatory proteins tend to be small usually 60–90 amino acids in length. The DNA-binding sites for regulatory proteins are often the inverted repeats of short DNA sequence (a palindrome) at which multiple subunits of regulatory protein bind co-operatively.

Several DNA-binding motifs have been described of which two play prominent role in the binding of DNA by regulatory proteins; the helix-turn-helix (HTH) and the zinc finger.

Helix-Turn-Helix Motif

A ‘motif’ is a short conserved sequence pattern associated with some distinct functions of a protein. Structurally, a motif is a simple combination of a few secondary structure elements.

HTH motif—DNA-binding motif—is crucial to the interaction of regulatory proteins with the DNA. The HTH motif comprises about 20 amino acids in two a-helical segments, each is 7–9 amino acids long, separated by a β-turn. The two α-helical segments occupy the N-terminal and C-terminal ends of the motif and play a role in recognition and binding to DNA. This motif interacts with the major groove of the DNA through hydrogen bonds and Van der Waals’ interactions. The lac repressor has this DNA-binding motif (Figure 7.15 (a)).

Zinc-Finger Motif

Zinc finger consists of about 30 amino acids forming an elongated loop held together at the base by a single Zn2+ ion, which is co-ordinated to four amino acid residues (four Cys or two Cys and two His). The Zn2+ ion does not itself interact with DNA, rather the co-ordination of zinc with the amino acid residues stabilizes this small structural motif. Several hydrophobic side chains in the core of the structure helps to stabilize the structure (Figure 7.14) and (Figure 7.15 (b)). Many eukaryotic DNA-binding proteins contain zinc finger. The interaction of a single zinc finger with DNA is typically weak. Many DNA-binding proteins such as Zif 268 have multiple zinc fingers that substantially enhance the binding by simultaneously interacting with the DNA. Zinc fingers also function as RNA-binding motifs; for example, proteins that bind eukaryotic mRNAs, transcriptional repressors, etc.


Figure 7.14 DNA binding motifs


Figure 7.15 Protein-protein interaction domains


The Cys2His2-like fold group is by far the best-characterized class of zinc fingers and are extremely common in mammalian transcription factors; for example, transcription factor TFIIIA. Certain hormone receptors that bind DNA sequences also have zinc-finger motif; for example, the glucocorticoid receptor.


This domain is made up of 60 amino acids. It is called as homeodomain because it was discovered in homeotic genes (the genes that regulate the development of body patterns). It is highly conserved and has been identified in a wide variety of proteins. The DNA-binding segment of the domain is related to the HTH motif. The DNA sequence that encode this domain is known as ‘homeobox’.


Figure 7.16 Homeo domain (a) Homeotic gene (b) Homeotic protein bound to DNA


Homeodomain is 180 bp long. It binds to DNA in a sequence-specific manner. The homeodomain contains three well-defined alpha helices and a more flexible fourth helix. Helices 2 and 3 form a HTH motif. Helix 3 acts as DNA recognition site and interacts directly with the major groove of DNA. The flexible amino terminus of homeodomain establishes contact with the minor groove of DNA (Figure 7.16).


Regulatory proteins contain domains not only for DNA binding but also for protein–protein interactions with RNA polymerase, other regulatory proteins or with other subunits of the same regulatory proteins. For examples, they include many eukaryotic transcription factors that function as gene activators. Some important examples of protein–protein interaction domains include the leucine zipper and the helix-loop-helix (HLH) domain.

Leucine Zipper

This domain contains four or five leucine residues spaced at intervals of seven amino acids resulting in their hydrophobic side chains being exposed at one side of a helical region. This region serves as the dimerization domain for two protein subunits, which are held together by hydrophobic interactions between the leucine side chains. Immediately following the leucine zipper is a region that is rich in positively charged amino acid such as lysine and arginine that binds the DNA (Figure 7.15 (c)).

HLH Domain

This is the common structural motif occurring in some eukaryotic regulatory proteins implicated in the control of gene expression during the development of multicellular organisms. These proteins have a conserved region of about 50 amino acids important both for DNA binding as well as for protein dimerization. This region can form short amphipathic α-helices linked by a loop of variable length. The HLH domain of two proteins interacts to form dimers. In these proteins, DNA binding is mediated by a short amino acid sequence that is rich in basic residues (Figure 7.15 (d)).


At the Level of Post-transcriptional Modification

In eukaryotes, much of the gene regulation takes place during the mRNA processing. Approximately 75 per cent of pre-mRNAs are degraded within the nucleus. This selective degradation acts as the control point of gene regulation. Alternative splicing of pre-mRNA permits the synthesis of different versions of a protein. Alternative splicing also affect the outcome of the development processes.

By Controlling the Stability of mRNA Molecule

All mRNA molecules have a characteristic life span. They are degraded in the cytoplasm. Some are degraded within minutes after their synthesis some last hours or even months or years, as in the case of the mRNAs stored in oocytes. The poly(A) tail and the 5′ guanosine caps protect the mRNAs from the action of nucleases and thereby increase their life span. Histone mRNAs that lack a poly(A) tail, a stem and loop structure is formed at the 3′ end.

At Translational Level

Some genes can be regulated at translational level. For example, in some anucleate cells, such as reticulocytes, transcriptional control is unavailable and hence regulated at translational level.

The immunoglobulin, IgM, exists in two forms namely a membrane-bound form and secreted form. Both the forms of the protein are encoded by the same gene and differ only in their heavy chains carboxyl ends. The IgM gene has two poly(A) addition sites and has a number of exons that can produce two alternative forms. The transmembrane anchor of the membrane-bound form is encoded by exons 5 and 6. The secreted product is produced when the exon 4 splice is not made and these transcripts are terminated just after exon 4.

At Post-translational Level

In some cases, regulation occurs after translation. The tubulin mRNA gene expression is regulated post-translationally. The first four amino acids, i.e., met, arg, glu and ile, of the tubulin gene product constitute a recognition element to which regulatory factors bind and degrade the mRNA depending on tubulin concentration in the cytoplasm.

By Hormone Action

Just as in prokaryotes, eukaryotes have many systems for responding to specific molecules with which they come in contact. For example, the hormone oestrogen, which is released by the ovaries, is cholesterol-derived steroid hormones. They are required for the development of female secondary sexual characteristics. As oestrogens are hydrophobic, they can cross the cell membrane. Inside the cell, they bind to soluble receptor proteins. Oestrogen receptor proteins are members of the large family of proteins that act as receptors for a wide range of hydrophobic molecules, including other steroid hormones, thyroid hormones and retinoids.

On binding the signal molecule (ligand), the ligand–receptor complex modifies the expression of specific genes by binding to control elements in the DNA called ‘hormone response elements’. The hormone-binding receptors usually have two domains; the DNA-binding domain and the ligand-binding domain. The DNA-binding domain has a zinc-finger motif.

Oestrogen receptors bind to specific DNA sites referred to as oestrogen response elements or EREs that contain the consensus sequence 5′AGGTCANNNTGACCT3′. Oestrogen receptor binds to such sites as a dimer.

The binding of the ligand receptor complex to the hormone response elements results in gene regulation, i.e., either activates or inactivates gene expression depending on the fact whether it is complexed with activator or repressor protein. For example, oestrogen binding to the receptor stimulates co-activator binding and up regulates gene expression.

Receptors for thyroid hormone and retinoic acid repress the transcription in the absence of the ligand. In their unbound form, the ligand-binding domains of these receptors bind to co-repressors. Examples of such co-repressor proteins include silencing mediator for retinoid and thyroid hormone receptors (SMRT) and nuclear hormone co-repressor (N-cor). Ligand binding triggers the release of co-repressor and frees the ligand-binding domain for binding to a co-activator.

Ligand binding to membrane receptors can regulate transcription through phosphorylation cascades. For example, the binding of epinephrine to 7TM receptors results in the activation of a G protein. Activated G protein binds to and activates adenylate cyclase, increasing the concentration of cAMP. cAMP binds to the regulatory subunits of protein kinase A (PKA) activating the enzyme. PKA also phosphorylates the cAMP response element-binding protein (CREB), which is a transcription factor, that binds specific DNA sequences as a dimer. Phosphorylated CREB binds to co-activator protein termed cAMP-binding protein (CBP).

Thus, the pathway initiated by epinephrine binding includes the phosphorylation of a transcription factor, the recruitment of co-activator and the assembly of complexes that participate in chromatin remodelling and transcription initiation.



  • Prokaryotes can survive in a wide range of ecological conditions because of their ability to ‘switch on’ and ‘switch off’ the expression of specific sets of genes in response to the specific demand of the environment.
  • The inappropriate expression of wrong genes in the wrong place, at the wrong time, in the wrong cell type may lead to deleterious consequences and even death. Thus regulating the gene expression is of prime importance for the cell and consequently for the organism.
  • Some gene products are required at all time and their genes are expressed at more or less constant level in all cells of a species or organism. These genes are called housekeeping genes and they are expressed at all time. This is called ‘constitutive gene expression’.
  • In bacteria, an mRNA is typically available for translation, while it is being synthesized; this is called ‘coupled transcription/translation’.
  • The bacterial genome is organized, so that the genes with a common function are often grouped together and are regulated together in units that are called as ‘operons’.
  • A ‘riboswitch’ is a mechanism that can control transcription and translation through the interactions of molecules with an mRNA.
  • A series of transcription-associated changes takes place in the chromatin of eukaryotes andthis is referred to as ‘chromatin remodelling’.
  • ‘Epigenetics’ is the study of heritable changes in chromatin (e.g., DNA methylation) withoutinvolving the change in DNA sequences.
  • The ligand (hormone)–receptor complex modifies the expression of specific genes by bindingto control elements in DNA called ‘hormone response elements’.
  1. What is constitutive gene expression?

  2. What is meant by the term operon? Describe the types of operons.

  3. Explain in detail the positive and negative regulation of Lac operon with illustrations.

  4. Briefly explain what is Histone methylation and Histone phosphorylation.

  5. What is known as RNA interference? Define siRNA.

  6. What is a motif? Explain the structure of Zinc finger motif.

  7. Describe the role of regulatory proteins in protein-protein interaction domain.

  8. Write short notes on leucine zipper.

  9. Illustrate the positive and negative regulation of Trp operon.

  10. Differentiate between gene regulation in prokaryotes and eukaryotes.

  1. Genes with a common function that are grouped together and are regulated together are called

    1. operons
    2. regulons
    3. cistron
    4. none of the given option
  2. Operons that are normally transcribed, but when repressor binds to apo repressor results in transcription inhibition is called

    1. negatively inducible operons
    2. positively inducible operons
    3. negatively repressible operons
    4. positively repressible operons
  3. Catabolite repression is mediated by

    1. cAMP
    2. CAP
    3. CRP
    4. All of the given options
  4. Analouges of lactose, which are effective inducers without being the substrate for β-galactosidase are called

    1. gratuitous inducers
    2. activators
    3. inducers
    4. co activators
  5. The intrinsic terminator that is located in the regulatory gene is

    1. operator
    2. attenuator
    3. promoter
    4. leader
  6. ———protein associated with chromatin remodeling

    1. HMGN
    2. SNF
    3. SWI
    4. all of the given options
  7. Histone acetylation results in

    1. transcription activation
    2. transcription inactivation
    3. both
    4. none
  8. Short RNA sequences that have complementary sequence to the mRNA and can silence gene expression are called

    1. siRNAs
    2. miRNAs
    3. mRNAs
    4. all of the given options
  9. ——— are transcriptionally independent domains that prevent the chromatin structure ofone region spreading to another are

    1. insulators
    2. promoters
    3. enhancers
    4. all of the given options
  10. The control elements in DNA that regulate transcription are

    1. HRE
    2. ERE
    3. CRE
    4. all of the given options

Becker, Wayne M., Kleinsmith, Lewis J., Hardin, Jeff and Bertoni, Gregory Paul. 2009. The World of the Cell, 7th edition, San Francisco, CA: Pearson Education Inc.

Catez, Frédéric, Brown, David T., Misteli, Tom and Bustin, Michael. 2002. ‘Competition between Histone H1 and HMGN Proteins for Chromatin Binding Sites’, EMBO Reports 3(8): 760–766.

Krebs, Jocelyn E., Goldstein, Elliott S. and Kilpatrick, Stephen T. 2011. Lewin’s GENES X, International edition, Jones and Bartlett Publishers, USA.

Lodish, Harvey, Berk, Arnold, Zipursky, S. Lawrence, Matsudaira, Paul and Baltimore, David. 2006. Molecular Cell Biology, 4th edition, New York, NY: W. H. Freeman & Co.

Nelson, David L. and Cox, Michael M. 2004. Lehninger Principles of Biochemistry (9780716743392) by Lehninger, W. H. Freeman & Co; New York.

Saha, Anjanabha, Wittmeyer, Jacqueline and Cairns, Bradley R. 2006. ‘Chromatin Remodelling:The Industrial Revolution of DNA Around Histones’, Nature Reviews Molecular Cell Biology, 7: 437–447.

Voet, Donald and Voet, Judith G. 2004. Biochemistry, Wiley International edition, 3rd edition, Hoboken, NJ: John Wiley & Sons, 1591.