2. Gene Organization – Essentials of Molecular Biology

2

Gene Organization

CONCEPT OUTLINE
  • Introduction
    • Genes
    • Non-coding DNA
    • Repetitive DNA sequences
    • Satellite DNAs
    • Cot value – Renaturation kinetics of DNA
    • Chromatin
    • Chromosomes
    • Prokaryotic gene organization
    • Gene organization in eukaryotes
INTRODUCTION

The hereditary nature of every living organism is defined by its ‘genome’. The genome consists of a long sequence of nucleic acid that provides the information needed to construct an organism. A genome consists of the entire set of chromosomes for any particular organism. It, therefore, comprises a series of DNA molecules (one for each chromosome), each of which contains many genes. The genome itself does not perform any function, rather a sequence of nucleic acids determines the hereditary features. By a series of interactions, the DNA sequence produces all the proteins of an organism at an appropriate time and place. The genome contains the complete set of hereditary information for any organism. Physically, the genome may be divided into a number of different DNA molecules or chromosomes. The genome may be defined as the sequence of DNA of each chromosome. Functionally, the genome is divided into genes.

Genes

Genes are the functional units of the genome. They are the hereditary units consisting of DNA sequences that occupy a specific location on a chromosome, which are called a ‘genetic locus’, and determine a particular characteristic in an organism. A gene is a sequence within the nucleic acid that codes for a single protein. Each of the discrete nucleic acid molecules of the genome may contain large number of genes. The number of genes varies in different organisms; it may vary from about 500 genes (for mycoplasma) to as many as 40,000 genes (for human beings). Genes hold the information to build and maintain an organism’s cells. All organisms have many genes corresponding to various biological traits such as eye colour, blood type, increased susceptibility to diseases and the thousands of basic biochemical processes that comprise life.

 

Figure 2.1 Genes

 

A gene is a sequence that produces a single strand of another nucleic acid, RNA, with a sequence that is identical to one of the two polynucleotide strands of DNA. In many cases, the RNA is successively used to direct the production of proteins; in other cases, they are functional end products; for example, rRNA and tRNA genes. Thus, a gene is a sequence of DNA that codes for RNA; however, in protein-coding regions or structural genes, the RNA successively codes for proteins (Figure 2.1).

A gene may exist in alternative forms. An ‘allele’ is one of two or more versions of a gene. Every individual inherits two alleles for each gene, one of paternal origin and the other of maternal origin. The alleles of a gene are the different forms that are found at its locus. ‘Cistron’ is the same as the gene. Basically, cistron describes a stretch of DNA that functions as a unit to give rise to an RNA or protein. Many bacterial mRNAs are ‘polycistronic’, i.e., a single mRNA molecule includes the coding region for several proteins that function together in a biological process. For example, the mRNA encoded by lac operon. In contrast, most eukaryotic mRNAs are ‘monocistronic’, i.e., each mRNA molecule encodes a single protein.

Non-coding DNA

Non-coding DNA describes the DNA sequences that do not encode for protein sequences. In eukaryotes, a larger percentage of the genome is non-coding DNA. The proportion of non-coding DNA in an organism varies greatly among various species.

Much of the non-coding DNAs do not have a known biological function; therefore, they are referred to as ‘Junk DNA’. However, many types of non-coding DNA sequences are involved in transcription and translation regulation of protein-coding sequences. Many non-coding sequences have undetermined functions.

Approximately 98 per cent of the human genomes are non-coding, that is, they do not encode protein sequences. An organism’s evolutionary complexity is related to the overall size of the genome and the amount of non-coding DNA; however, there are exceptions.

For example, the genome of the pufferfish (Takifugu rubripes) is only one-eighth of the size of the human genome; however, it has a comparable number of genes. Most of the variations in genome size are seen in the non-coding DNA. These extensive variations in nuclear genome size among eukaryotes are known as ‘C-value enigma’ or ‘C-value paradox’. The total amount of DNA contained in a haploid genome is sometimes referred to as the ‘C-value’ for the species.

Types of non-coding DNA

Non-coding functional RNA genes

Non-coding functional RNA genes are the gene sequences of RNA molecules that are not translated into proteins; for example, rRNA, tRNA and microRNAs.

Cis and Trans regulatory elements

Cis and Trans regulatory elements are the sequences that control gene expression; for example, promoter and enhancer sequences in eukaryotes, and operator and repressor protein-binding sequences in prokaryotes.

Introns

Introns are the non-coding regions of a gene. They are the intervening sequences of an mRNA and are not translated into protein sequences.

Pseudogenes

Pseudogenes are the DNA sequences that are related to the known genes that have lost their protein-coding ability or the genes that are no longer expressed in the cell. Pseudogenes, which are also called ‘genomic fossils’, are non-functional due to mutations.

‘Repetitive DNA sequences and telomeres’ (repetitive DNA at the end of a chromosome) are also non-coding sequences.

Functions of non-coding DNA sequences

Many non-coding DNA sequences have important biological functions.

  • Disease-causing genetic variants lie in the non-coding DNA. Short, specific sequences of non-coding DNA have been discovered, which give rise to non-coding RNA such as microRNA and siRNA. These RNAs function as regulatory elements and have also been linked to diseases in humans.
  • Non-coding DNA are also associated with chromosome structure, centromere function and homologue recognition in meiosis.
  • Non-coding DNA sequences act as gene regulators, controlling the expression of genes.

Repetitive DNA Sequences

One major difference between the genomes of prokaryotes and the genomes of eukaryotes is that most eukaryotes contain repetitive DNA, i.e., certain DNA sequences repeat several times in the total DNA of the cell. These sequences do not code for polypeptides.

 

Figure 2.2 Classification of repetitive DNA sequences

 

Repetitive sequences are classifies as:

  • Highly repetitive DNA sequences and
  • Moderately repetitive DNA sequences (Figure 2.2).

Highly repetitive sequences

These are short nucleotide sequences, which are repeated thousands of times all over the genome. Over 50 per cent of mammalian DNA consists of ‘highly repetitive DNA sequences’, some of which consist of about 105–106 copies per genome. These are also called simple sequence repeats (SSR’s) or short tandem repeats (STR’s). These sequences are mostly located in heterochromatin regions around centromere, Telomere. Telomeres also consist of G–C-rich SSR’s. SSR’s comprise about 3 per cent of the human genome, with the greatest contribution (0.5 per cent) contributed by the dinucleotide repeat, most frequently (CA)n and (TA)n. The postulated functions of these sequences include structural or organizational roles, role in chromosome pairing, involvement in cross over or recombination or junk.

Moderately repetitive DNA sequences

These DNA sequences consist of about 10–105 copies per genome. They are found throughout the euchromatin. Their average size is about 300 base pairs. They may be classified as:

  1. Microsatellites/mini satellites (VNTR (variable nucleotide tandem repeats), DNA ‘fingerprints’)
  2. Dispersed-repetitive DNA, mainly transposable elements (‘LINEs’ (long interspersed nucleotide elements) and ‘SINEs’ (short interspersed nucleotide elements)).

The ‘redundant’ genes for histones and rRNA and proteins genes also are moderately repetitive sequences. Many moderately repeated gene sequences function as gene regulators. This is supported by their location adjacent to structural genes. Much of moderately repeated DNAs consist of ‘transposable elements’ or ‘mobile DNA elements’. The process by which these sequences are copied and inserted into a new site in the genome is called ‘transposition’. The two major families, the LINEs and the SINEs are represented in humans mainly by LINE1-L1 and Alu elements (so called because most of their approximately 300-base pair segments contain a cleavage site for the restriction endonuclease Alu I) respectively. Both types of element are considered to be retrotransposable (i.e., they can replicate through an RNA intermediate by reverse transcription). They have significant role in genomic function and evolution. In addition, the human genome comprises 2.8 per cent of DNA transposons. Hence, around 45 per cent of the human genome consists of inactive transposable elements.

Long interspersed nucleotide elements

The human genome contains around 8,68,000 LINEs, nearly 17 per cent of the genome. Most of these belong to a family called LINE-1 (L1). These L1 elements are DNA sequences that range in length from a few hundred to as many as 9,000 base pairs. Only about 50 L1 elements are functional genes. These are about 6,500 base pair in length and encode proteins, namely an endonuclease that functions to cut DNA and a reverse transcriptase, an RNA-directed DNA polymerase. ‘DNA finger printing’ applies the diversity of LINEs between individual human genome thus making them as a useful marker.

Short interspersed nucleotide elements

These are short DNA sequences, which have about 100–400 base pairs. SINEs are reverse transcribed by RNA molecules originally transcribed by RNA polymerase III, that includes molecules of tRNA, 5S rRNA and some other small nuclear RNAs. Example: the ‘Alu elements’, which represent about 10.6 per cent of our total DNA. Alu elements consist of a sequence of 300 base pairs that contain a site that is recognized by the restriction enzyme AluI. Most SINEs do not encode any functional molecules and depend on the machinery of active L1 elements to be transposed, i.e., to be copied and pasted in new locations.

Insertional mutagenesis by SINEs and LINEs in mammals is becoming more abundant, with L1. For example, the diseases such as haemophilia, sporadic breast and colon cancer are associated with insertional mutations of moderately repetitive DNA sequences.

Satellite DNAs

These are highly repetitive eukaryotic DNAs; they are primarily clustered in the heterochromatic regions of chromosomes, near centrosomes and telomeres. Satellite DNA usually has a different buoyant density than the rest of the cell’s DNA. They form a separate band in a buoyant density gradient because of their different nucleotide composition (they are A–T-rich DNAs and are less dense than G–C-rich DNAs). Their lengths vary from one to several thousand base pairs. They are found abundantly on Y chromosome (Figure 2.3).

‘MiniSatellites’. These are also called ‘variable number tandem repeats’ (VNTRs) are moderately repetitive and tandemly repeated arrays of moderately sized DNA sequences. They have 9–100 base pairs in their genome; however, usually about 15 base pairs repeat. They are found in the euchromatin regions of the genome of vertebrates, fungi and plants.

‘Microsatellites’ are moderately repetitive DNA sequences. They are composed of arrays of short 2–6 base pairs repeats found in the genomes of vertebrates, insects and plants. The human genome contains at least 30,000 microsatellite located in euchromatin. Copy numbers vary within a population with mean array sizes of about 10–100.

Microsatellites occasionally occur within transcription units. Some individuals are born with a large number of repeats in specific genes than observed in the general population. This is because of the daughter strand slippage during the DNA replication in a germ cell from which they developed. Such expanded microsatellites are associated with at least 14 different types of neuromuscular diseases. Some common types of diseases associated with microsatellite repeats are myotonic dystrophy and spinocrebellar ataxia. In these cases, the expanded repeats interfere with RNA processing in the neurons.

In situ hybridization studies with the metaphase chromosomes have localized satellite DNAs to specific chromosomal regions. In mammals, much of these satellite DNAs lie near the centromere and at telomeres.

The DNA of Drosophila virilis exhibits three satellite bands; each consists of a closely related repeating hepta nucleotide sequence (Figure 2.3):

 

Figure 2.3 Satellite DNA (Satellite DNA sequences of Drosophila virilis)

Repeat sequences and DNA finger printing

Within individuals, the nucleotide sequences of the repeat units composing simple sequence DNA tandem arrays are highly conserved. Slight differences in the total length of various minisatellites of various individuals can be detected by Southern blotting of cellular DNA treated with restriction enzyme that cuts outside the repeat sequence. The polymerase chain reaction using primers that hybridize to the unique sequences flanking each minisatellite can detect differences in the length of minisatellites among different individuals. These DNA polymorphisms form the basis of DNA finger printing.

Unique sequences

The eukaryotic genome consists of DNA that is unique in sequence. The human genome encodes for about 1,00,000 proteins. The average coding regions of a gene (‘the exons’) consist of about 2,000 base pairs of DNA that is unique in sequence. Less than 14 per cent of that DNA is unique. Most of the coding sequences are interrupted by non-coding sequences or ‘introns’. The total length of the introns is generally much greater than the total length of the exons. Sequences that regulate gene expression also account for some of the unique sequences. The actual amount of DNA coding for functional gene products is probably less than 3 per cent of the total genomic DNA. The genes that are represented only once in the haploid genome are called solitary genes; for example, the chicken lysozyme gene.

Cot Value – Renaturation Kinetics of DNA

Studies of the kinetics of reassociation of thermally denatured DNA have revealed that eukaryotic DNA, in contrast with prokaryotic DNA, contains many repeated base sequences. The measurement of the rate of renaturation can give valuable information about the complexity of a molecule. The complexity of a molecule will be less than its total length if all sequences are repetitive; on the other hand, the complexity will be equal to the total length if all sequences are unique, that is appearing only once in the genome.

DNA can be sheared into small fragments and then can be denatured by heating the solution above the melting temperature (T m) of the DNA. This solution of a single-stranded DNA can be made to renature by cooling to about 25°C below T m. The kinetics of renaturation can be measured in a variety of ways. One method is to follow the absorbance of the solution at 260 nm. At this wavelength, the absorption coefficient of a double-stranded DNA is about 40 per cent less than that of a single-stranded DNA. This decrease in absorption is called ‘hypochromicity’. Another technique is based on the fact that a double-stranded DNA binds to hydroxy apatite columns whereas a single-stranded DNA passes through.

The rate at which the heat-denatured DNA sequences in solution will renature is dependent on DNA concentration, reassociation temperature, cation concentration and viscosity (usually not a factor if DNA is free of contaminants).

The observed kinetics of reassociation of DNA is a bimolecular reaction:

 

where A and B are complementary single-stranded DNAs, D is the reassociated double helix and K is the rate constant for association.

The rate of renaturation of the denatured DNA is, therefore, expressed as:

d [A]/dt = −K[A][B]

where k is second-order rate constant.

Since [A] = [B] for duplex DNA,

d [A]/dt = –k[A][A]
d [A]/dt = –k[A]2.

By integrating both sides,

1/[A] = 1/[A]0 + kt,          (1)

where [A]0 is the initial concentration of A. It is convenient to measure the fraction f of unpaired strands.

f = [A]/[A]0          (2)

From Equations (1) and (2),

f = 1/1 + [A]0 kt           (3)

The concentration terms in these equations refer to unique sequences, since the collision of non-complementary sequences do not lead to renaturation. Hence, if C0 is the inital concentration, then

[A]0 = C0/x,

where x is the ‘complexity of DNA’. It refers to the number of base pairs in each unique sequence. By substituting the value of [A]0 in Equation (3),

f = 1/1 + C0/x kt

when half of the molecules in the sample have renatured f = 1/2.

 

Figure 2.4 C0 curves

 

C0t 1/2 = x /k: t 1/2 is the time taken for renaturation to occur. For a given set of conditions, the value of t 1/2 depends only on the complexity x of the DNA. It is convenient to depict the kinetics of reassociation by plotting f versus the logarithm of C0t. Such a ‘C0t curve’ has a sigmoidal shape (Figure 2.4).

C0t = DNA concentration (mol/L) × renaturation time in seconds × buffer factor that accounts for the effect of cations on the speed of renaturation.

C0t 1/2 of E. coli DNA is about 15 Ms; for T4 phages, it is 0.3 Ms; and for mammalian genome, it is about 104 Ms.

Uses of C0t analysis

  • It is used for the determination of genome size and the fraction of a single copy or unique DNA sequences in a genome.
  • It is used to determine the number, size, complexity and relative arrangement of repetitive DNA sequences.
  • It allows genome comparisons among genetically similar or dissimilar organisms.
  • It paved the way for the development of the principles of nucleic acid hybridization.
  • The information generated in C0t studies remains central to the current knowledge of genome structure.

Chromatin

Eukaryotic genomes are organized nucleoprotein complexes, which are known as chromatin. They affect most processes that occur on DNA. Chromatin is the single very long duplex DNA folded into a fibre that runs continuously throughout the chromosome. Throughout the life of a cell, chromatin fibres take on different forms inside the nucleus.

Eukaryotic chromosomes are visible only for a brief period during cell division. Once mitosis has been completed, most of the chromatin that composes the highly compacted chromosomes becomes dispersed. However, in most of the cells, about 10 per cent of chromatin material remains condensed throughout the interphase and visible under the light microscope. Such chromatin stains deeply and is called ‘heterochromatin’. It is typically found at centromeres and occurs at other location as well. It passes through the cell cycle with relatively little change in its degree of condensation. Often various heterochromatin regions especially those associated with centromeres aggregate into a densely staining chromocenter. The common form of heterochromatin that always remains heterochromatic is called ‘constitutive heterochromatin’. The chromatin regions that are less densely packed in the mitotic chromosome are called ‘euchromatin’. It has a relatively dispersed appearance in the nucleus and occupies most of the nuclear region. The regions of euchromatin, which are converted to heterochromatin state, are called ‘facultative heterochromatin’. The chromatin undergoes a 10,000-fold compaction into specialized structure called the ‘chromosome’ (Figure 2.5).

 

Figure 2.5 Hetero chromatin and Euchromatin

Chromosomes

The eukaryotic chromosomes are diploids, as they are derived one from each parent. Each such pair of is referred to as ‘homologous chromosomes’. During replication, each member of the homologous pair undergoes DNA synthesis and this results in the formation of two ‘chromatids’. Each chromosome has two arms: a long ‘q arm’ and a short ‘p arm’ (Figure 2.6). A chromosome has a constricted region called the ‘centromere’. The centromeric chromatin comprises DNA sequences, specialized centromeric variants and a group of specific proteins that are responsible for establishing the structure that attaches the chromosome to microtubules. This structure is called the ‘kinetochore’. A chromosome is said to be ‘metacentric’ if the centromere is in or very near the middle. A chromosome having a centromere between the middle and the end is referred to as a ‘submetacentric or acrocentric’. If the centromere is at or very near tip, it is known as ‘telocentric’.

 

Figure 2.6 Chromosome

Lampbrush chromosomes

These are meiotic bivalents, in which the two pair of sister chromatids are held together at chiasmata. The sister chromatids remain connected along their lengths and each homologue thus appears as a single fibre. The lampbrush chromosomes take their name from their lateral loops that extrude from the centromere at certain positions. The arrangement of fibres around the chromosome axis resembles the cleaning fibres of a lampbrush. The loops existed in pairs, one from each sister chromatid.

Polytene chromosomes>

These are greatly enlarged chromosomes. They are produced by the successive replication of synapsed diploid pair of chromosome. These chromosomes replicate but their replicas do not separate, they remain attached together in their extended state. At the start of the process, each synapsed pair has a DNA content of 2C (C represents the DNA content of individual chromosome). This amount then doubles up to nine times, at its maximum giving a content of 1024C. For example, the interphase nuclei of some tissues of larvae of dipteran flies contain polytene chromosomes.

Chromosome parameters

Various parameters characterize a chromosome. They are:

d-value: The difference in the length between the long arm and the short arm of a chromosome, i.e., d = qp, where q and p are the lengths of the long arm and the short arm of the chromosome, respectively.

r-value: It is the ratio of the lengths of the two chromosome arms, i.e., r = q /p.

Centro metric index: It is the distance from the centromere to the tip of the short arm, which is expressed as a percentage of the total length of the chromosome, i.e., i = 100 × p /(p + q ).

Monocentric chromosome: Chromosome with a single defined centromere.

Holocentric chromosome: Chromosome with a diffused centromere.

Telocentric chromosome: A monocentric chromosome with a terminal centromere (for chromosome length of 10 units, p = 0, q = 10, d = 10, r = ± and i = 0). It is referred to as T-chromosome or mono-brachial chromosome.

Atelocentric chromosome: A monocentric chromosome with a non-terminal centromere. This type of chromosome has the following types.

  • Metacentric: A monocentric chromosome with a central centromere. When the centromere is exactly at the median point, it is referred to as M-chromosome.
  • Submetacentric: A monocentric chromosome with a centromere in the submedian region.
  • Subacrocentric: A monocentric chromosome with the centromere in the subterminal region.
  • Acrocentric: A monocentric chromosome with the centromere very close to the terminal region.

Chromosomes can be divided into two types, which are autosomes and sex chromosomes or allosomes. Genetic traits related to a person’s sex are passed on through the sex chromosomes. The autosomes or body chromosomes contain the rest of the genetic hereditary information. Human cells have 23 pairs of large linear nuclear chromosomes of which 22 pairs are autosomes and one pair is sex chromosomes, giving a total of 46 chromosomes per cell. Human cells are diploid and have 22 different types of autosomes each present as two copies and two sex chromosomes. This gives 46 chromosomes in total.

Karyotype

The number, size and shapes of the metaphase chromosomes constitute the ‘karyotype’, which is distinctive for each species. All cells of most organisms have same karyotype; however, species that appear quite similar can have very different karyotypes, indicating that similar genes can be organized on chromosomes in different ways. For example, two species of small deer – the Indian Muntjac and Reeves’ Muntjac – contain the same total amount of genomic DNA. In one species, this DNA is organized into 22 pairs of autosomes and two pairs of sex chromosomes. In contrast, the other species contains twenty three pairs of autosomes and one sex chromosome (Figure 2.7).

 

Figure 2.7 Karyotype

Chromosome banding

Certain dyes selectively stain some regions of metaphase chromosomes more intensely than the other regions, producing characteristic banding patterns that are specific for individual chromosome. Chromosome more intensely than the other regions, producing characteristic banding patterns that are specific for individual chromosome. Chromosome banding serves as visible landmarks along the length of each chromosome and can help to distinguish the chromosomes of similar size and shape. The molecular basis for the regularity of chromosomal bands remains unknown.

‘G bands’ are produced when metaphase chromosomes are subjected briefly to mild heat or proteolysis and then stained with Giemsa stain. G bands correspond to large regions of human genome that have an unusually low G–C content.

Treatment of chromosomes with a hot alkaline solution before staining with Giemsa stain produces ‘R bands’. The R-band patterns are approximately the reverse of the G-band pattern. The distinctiveness of these banding patterns helps in the identification of specific parts of a chromosome and to locate the sites of chromosomal breaks and translocations (Figure 2.8).

Prokaryotic Gene Organization

The bacterial genomes are organized into definite bodies. The genetic material can be seen as a fairly compact clump or series of clumps that occupies about a third of the volume of the cell and is called the ‘nucleiod’. Protein HU is a dimer that condenses DNA, possibly wrapping it into a bead-like structure. It is related to IMF (integration host factor) another dimer which has a structural role in building a protein complex in specialized recombination reactions. Another protein, protein H1 (also known as H–NS), binds DNA, interacting preferentially with sequences that are bent. The nucleiod consists of approximately 80 per cent DNA by mass.

Bacterial DNA has independently coiled domain. Each domain consists of loop of DNA; the ends of which are secured in some unknown way. There are nearly 100 such domains per genome. Each domain consists of 40 Kb (13 μm) of DNA, which is organized into a compact fibre. The existence of separate domains could permit different degrees of supercoiling to be maintained in different regions of the genome. The 1-mm-long DNA molecule of E. coli chromosome is contained within cells that are only about 2 μm long and about 0.5–1 μm wide. This is possible because several mechanisms operate to compact E. coli chromosome, so that it can fit inside the bacterial cell. For example, the large volume filled by free DNA is due to charge repulsion between positively charged phosphate groups. In the cell, this effect is reduced by associating DNA with positively charged poly-amines, such as spermine or spermidine, which shield the negative charges of DNA (Figure 2.9).

 

Figure 2.8 Chromosome banding

 

Figure 2.9 Gene organization in prokaryotes

Gene Organization in Eukaryotes

The cells must accommodate the amounts of DNA that are many times longer than the cells they are in. DNA packaging must be very effective while still allowing for DNA replication and transcription to occur. Eukaryotes package DNA in the nucleus into chromatin and chromosomes. Chromatin consists of fibres that contain protein, DNA and small amount of RNA. The DNA in the chromatin is very tightly associated with the proteins called ‘histones’. These have molecular weight between 11,000 and 21,000 and are very rich in the basic amino acids arginine and lysine. There are five major classes of histones found in all eukaryotic cells, namely H1, H2A, H2B, H3 and H4.

The histones H3 and H4 are nearly identical in amino acid sequence in all eukaryotic cells; however, H1, H2A and H2B show a lesser degree of homology between eukaryotes.

Nucleosomes

The DNA in the chromatin is very tightly associated with histones, which package and order the DNA into structural units called nucleosomes. These are fundamental units of organization upon which the higher-order packaging of chromatin is built. Each nucleosome contains eight histone molecules, two copies each of H2A, H2B, H3 and H4. The spacing of the nucleosome beads along the DNA defines a repeating unit typically of about 2,000 base pairs, of which 146 base pairs are bound tightly around the histone core and the remaining base pairs serve as linkers among nucleosomes. The histone H1 is not a part of nucleosome core; however, it is generally bound to the linker DNA. The histone core does not bind randomly to the DNA, but nucleosomes tend to position themselves at certain specific locations. It appears that nucleosomes form where A=T base pairs are abundant, wherever the minor groove of the DNA helix contacts the nucleosome core (Figure 2.10).

Histones contain a flexible amino terminus of 11–37 residues extending from fixed structures of the nucleosome. These termini are called histone tails. The histone tails are required for condensing the DNA from ‘bead on string’ conformation into 30-nm fibre. Several positively charged lysine residues in the histone tails may interact with linker DNA and the tails also interact with neighbouring nucleosomes. The histone-tail lysines, especially those in H3 and H4, undergo reversible acetylation and deacetylation by enzymes. When acetylated, the positive charge of the lysine έ-amino group is neutralized, thereby eliminating its interaction with a DNA phosphate group. Thus, the greater the acetylation of histone N -termini, the lesser is the chromatin condensation. Histone acetylation is correlated with the resistance of chromatin DNA digestion by nucleases and also plays a role in eukaryotic gene regulation.

The 30-nm fibre

Wrapping DNA about a nucleosome core compacts it several fold. The total compaction in a chromosome is greater than 10 fold, which provides ample evidence for higher orders of structural organization. Nucleosomes are organized to form a structure called 30-nm fibre. This packaging requires only one molecule of the histone H1 per nucleosome. Organization into 30-nm fibre does not extend over the entire chromosome but is punctuated by the regions that are bound by the sequence-specific non-histone DNA-binding proteins. The structure observed also depends on the transcriptional activity of the particular region of DNA. The regions that are being transcribed are usually in a less ordered state that contains little histone H1. The 30-nm fibre provides approximately 100-fold compaction of the DNA (Figure 2.10).

Nuclear scaffold

Certain regions of the DNA associate with a nuclear scaffold. The scaffold-associated regions are separated by the loops of DNA with 20,000 to 1,00,000 base pairs. The DNA in these loops may contain a set of related genes. The scaffold itself contains several proteins, notably large amounts of H1 and topoisomeraseII, which is essential for maintaining the chromatin structure (Figure 2.10).

 

Figure 2.10 Compaction of DNA in a eukaryotic chromosome

Higher organization

Evidence exists for the additional layers of organization in eukaryotic chromosomes, each enhancing the compaction multiplicatively. DNA compaction in eukaryotic chromosomes is likely to involve coils upon coils, resulting in the chromatin.

  • A ‘genome’ consists of the entire set of chromosomes for any particular organism. It, therefore, comprises a series of DNA molecules (one for each chromosome) each of which contains many genes. The genome itself does not perform any function, rather it is the sequence of nucleic acids that determines the hereditary features.
  • ‘Genes’ are the functional units of the genome. They are the hereditary unit consisting of DNA sequences that occupy a specific location on a chromosome, which is called a ‘genetic locus’, and determine a particular characteristic in an organism.
  • An ‘allele’ is one of two or more versions of a gene. Every individual inherits two alleles for each gene, one from paternal origin and the other from maternal origin.
  • ‘Non-coding DNA’ describes the DNA sequences that do not encode for protein sequences. In eukaryotes, a larger percentage of the genomes are non-coding. The proportion of non-coding DNAs in an organism varies greatly among species. Much of the non-coding DNAs do not have known biological function; therefore, they are referred to as ‘Junk DNA’.
  • Extensive variation in nuclear genome size among eukaryotes is known as ‘C-value enigma’ or ‘C-value paradox’. The total amount of DNA contained in a haploid genome is sometimes referred to as the ‘C value’ for the species.
  • ‘Highly repetitive sequences’ are short nucleotide sequences that are repeated thousands of times all over the genome. These are also called SSR’s or STR’s. These sequences are mostly located in heterochromatin regions around centromere, telomere.
  • ‘Moderately repetitive DNA sequences’ are DNA sequences that consist of about 10–105 copies per genome. They are found throughout the euchromatin. Their average size is about 300 base pairs. They may be classified as (a) microsatellites/mini satellites (VNTR, DNA ‘fingerprints’) (b) dispersed-repetitive DNA, mainly transposable elements (‘LINEs’ and ‘SINEs’).
  • ‘Satellite DNAs’ are highly repetitive eukaryotic DNAs; they are primarily clustered in the heterochromatic regions of chromosomes, near centrosomes and telomeres. Satellite DNA usually has a different buoyant density than the rest of the cell’s DNA. They form a separate band in a buoyant density gradient because of its different nucleotide composition.
  • is the time taken for DNA renaturation to occur. For a given set of conditions, the value of depends only on the complexity x of the DNA. It is convenient to depict the kinetics of reassociation by plotting a ‘C0t curve’, which has a sigmoidal shape. C0t = DNA concentration (mol/L) × renaturation time in seconds × buffer factor that accounts for the effect of cations on the speed of renaturation.
  • Eukaryotic genomes are packaged into a nucleoprotein complex known as chromatin. In most of the cells, about 10 per cent of chromatin material remains condensed throughout the interphase and visible under the light microscope. Such chromatin stains deeply and is called ‘heterochromatin’. Heterochromatin that always remains heterochromatic is called ‘constitutive heterochromatin’. The chromatin regions that are less densely packed in the mitotic chromosome are called ‘euchromatin’. The regions of euchromatin which are converted to heterochromatin state are called ‘facultative heterochromatin’.
  • The bacterial genomes are organized into definite bodies. The genetic material can be seen as a fairly compact clump or series of clumps that occupies about a third of the volume of the cell and is called the ‘nucleiod’.
  • The DNA in the chromatin is very tightly associated with histones, which package and order the DNA into structural units called nucleosomes. These are fundamental units of organization upon which the higher-order packaging of chromatin is built. Histones package and order the DNA into structural units called ‘nucleosomes’. Nucleosomes are organized to form a structure called ‘30-nm fibre’. Certain regions of the DNA associate with a ‘nuclear scaffold’. DNA compaction in eukaryotic chromosomes is likely to involve coils upon coils, resulting in the ‘chromatin’.
  1. Define gene, genetic locus and genome.

  2. What is meant by the term junk DNA.

  3. Briefly describe the types and functions of non coding DNA.

  4. Define LINEs and SINEs and their importance.

  5. What are minisatellites and microsatellites?

  6. Explain the renaturation kinetics of DNA.

  7. What is Cot value? Mention its uses.

  8. Differentiate heterochromatin and Euchromatin.

  9. Describe lampbrush chromosome and polytene chromosome along with its unique features.

  10. What are d and r values of a chromosome?

  11. What is centrometric index? Explain the different types of chromosomes with respect to the position of centromere.

  12. What is the mechanism of Chromosome banding? Mention the concept of G bands and R bands.

  13. Differentiate between gene organization of prokaryotes and eukaryotes.

MULTIPLE-CHOICE QUESTIONS
  1. What is meant by C value?

    1. The total amount of mRNA contained in a haploid genome
    2. The total amount of protein contained in a haploid genome
    3. The total amount of DNA contained in a haploid genome
    4. The total amount of DNA contained in a diploid genome
  2. Non coding regions of a gene can be called as———.

    1. recons
    2. introns
    3. cistrons
    4. codons
  3. SSR’s comprise about——% of the human genome.

    1. 0.5
    2. 2.5
    3. 3
    4. 5
  4. Which one of the following is a characteristic feature of satellite DNA’s?

    1. A:T rich DNAs and are less dense than G:C rich DNAs
    2. C:G rich DNAs and are less dense than A:T rich DNAs
    3. C:G and A:T densities are almost the same
    4. None of the above
  5. f = 1/1 + C 0/x kt – in this equation x refers to———.

    1. Density of DN A
    2. Complexity of DNA
    3. Density of RNA
    4. Complexity of RNA
  6. What is the common form of heterochromatin that always remains heterochromatic?

    1. constitutive heterochromatin
    2. Euchromatin
    3. Facultative heterochromatin
    4. Kinetochore
  7. r-Value is the——— of the lengths of the two chromosome arms.

    1. sum
    2. (b) ratio
    3. square
    4. cube
  8. What type of chromosome has a diffused centromere?

    1. Atelocentric chromosome
    2. Telocentric chromosome
    3. Holocentric chromosome
    4. Monocentric chromosome
  9. G bands correspond to large regions of human genome that have an———

    1. unusually low G + C content
    2. unusually high G + C content
    3. unusually high A + T content
    4. unusually low A + T content
  10. What is the formula to calculate centromeric index?

    1. i = 100 × q/(q + p)
    2. i = 100 × p/(p + q)
    3. = 100 × q/(p + q)
    4. = 100 + p/(p + q)

Krebs, Jocelyn E., Lewin, Benjamin, Goldstein, Elliott S., and Kilpatrick, Stephen T. 2011. Lewin’s GENES X. Jones and Bartlett Publishers, USA.

Voet, Donald and Voet, Judith G. 2004. Biochemistry, 3rd edition. J. Wiley & Sons, 1591.

Lehninger, Albert L., Nelson, David L. and Cox, Michael M. 2004. Lehninger Principles of Biochemistry. New York: W H Freeman & Co.

Lodish, Harvey, Berk, Arnold, Zipursky, S. Lawrence, Matsudaira, Paul, Baltimore, David. 2006. Molecular Cell Biology, 4th edition. New York: W H Freeman & Co.

GvozdevVA. 1978. ‘Genome Organization in eukaryotes’, Molekuliarnaia Biologiia (mosk), 12(1):5–35.

Griswold, Ann. 2008. ‘Genome Packaging in Prokaryotes: The Circular Chromosome of E. Coli’, Nature Education, 1(1).

Ahnert, S. E., Fink, T. M. A., Zinovyev, A. 2008. ‘How Much Non-coding DNA Do Eukaryotes Require?’, Journal of Theoretical Biology, 252, 587–592.