Transcription is the process of synthesis of RNA from a DNA template. It is the first step of gene expression, in which a particular segment of DNA is copied into RNA by the enzyme RNA polymerase. The transcription of DNA into RNA is the primary level at which gene expression is regulated in both prokaryotic and eukaryotic cells. The stretch of DNA transcribed into an RNA molecule is called a transcription unit and encodes at least one gene. The principal enzyme responsible for RNA synthesis is RNA polymerase, which catalyzes the polymerization of ribonucleoside 5'-triphosphates as directed by a DNA template. Transcription in eukaryotic cells is more complex than that of prokaryotic cells, though it proceeds by the same fundamental mechanisms in both. Transcription in the eukaryotic system require distinct initiation factors that were not associated with the polymerase. Eukaryotes often have a promoter region upstream from the gene, or enhancer regions up or downstream from the gene, with certain specific motifs that are recognized by the various types of transcription factors. The transcription factors bind, attract other transcription factors and create a complex that eventually facilitates binding by RNA polymerase, thus beginning the process of transcription.
Transcription factors are specific proteins that are required for RNA polymerase II to initiate transcription. They contain one or more DNA-binding domains (DBDs), which attach to specific sequences of DNA adjacent to the genes that they regulate. Transcription factors are found in all living organisms and the number of transcription factors found within an organism increases with genome size, and larger genomes tend to have more transcription factors per gene. It is estimated that about 5% of the genes in the human genome encode transcription factors, specifying the importance of these proteins. Transcription factors regulate gene expression alone or with other proteins in a complex, by promoting (as an activator), or blocking (as a repressor) the recruitment of RNA polymerase to specific genes. The regulation of transcription initiation is mediated by the interplay between two classes of promoter elements: the basal promoter elements, which can be defined as those promoter elements sufficient to direct basal levels of transcription in vitro, and the regulatory elements, which modulate the levels of transcription. The basal elements are recognized by basal transcription factors, whereas the regulatory elements are recognized by either transcriptional activators or repressors. Eukaryotic activators are often modular, consisting of a DNA binding domain, which targets the activator to the correct promoter, and of activation domains, whose role is to enhance transcription. Two types of transcription factors have been defined i.e., general transcription factors and gene specific transcription factors. General transcription factors are involved in transcription from all polymerase II promoters and therefore constitute part of the basic transcription machinery.
General transcription factors
General transcription factors (GTFs) are required for initiation of transcription by RNA polymerase II in eukaryotes. Many of these general transcription factors don't actually bind DNA but are part of the large transcription pre-initiation complex that interacts with RNA polymerase directly. The most common GTFs are TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH.
Transcription factors are modular in structure and contain three domains, i.e., DNA binding domain, trans-activating domain, and an optional signal sensing domain. DNA binding domain attach to specific sequences of DNA (enhancer or promoter). DNA sequences that bind transcription factors are often referred to as response elements. Trans-activating domain (TAD), which contain binding sites for other proteins such as transcription coregulators. These binding sites are frequently referred to as activation functions (Afs). An optional signal sensing domain (SSD) (e.g., a ligand binding domain), which senses external signals and, in response, transmits these signals to the rest of the transcription complex, resulting in up- or down-regulation of gene expression.
The promoters of many genes transcribed by polymerase II contain a sequence similar to TATAA, 25 to 30 nucleotides upstream of the transcription start site. The sequence is known as TATA box and the first step in formation of a transcription complex is the binding of the general transcription factor TFIID. TFIID is a multiprotein complex in which only one polypeptide, TATA-binding protein (TBP) binds to the TATA box. TATA-binding protein binds specifically to the TATAA consensus sequence, and approximately 10 other polypeptides, called TBP-associated factors.
TATA binding protein is a monomeric protein and plays a major role in transcription initiation. All eukaryotic TBPs analyzed have very highly conserved C-terminal domains of 180 residues and this conserved domain functions as well as the full-length protein in in vivo transcription. When TBP binds to a TATA box within the DNA, it distorts the DNA by inserting amino acid side-chains between base pairs, partially unwinding the helix, and doubly kinking it. TBP binds with the negatively charged phosphates in the DNA backbone through positively charged lysine and arginine amino acid residues. The strain imposed on the DNA through this interaction initiates melting, or separation, of the strands.
TFIIA is one of the general transcription factors which binds with the TBP subunit of TFIID and enhances TFIID binding to the TATA box, stabilizing the TFIID-DNA complex. Interaction of TFIIA with TBP facilitates formation of and stabilizes the pre-initiation complex and also results in the exclusion of negative (repressive) factors that might otherwise bind to TBP and interfere with the pre-initiation complex formation. It seems that binding of TFIIA with TFIID prevents binding of the inhibitory factors and as a result of which formation of the transcription complex continues with the binding of other transcription factors.
TFIIA is encoded by two separate genes, one of which encodes a large subunit ( TFIIAL, TOA1; gene name GTF2A1) and another which encodes a small subunit ( TFIIAS, TOA2; gene name GTF2A2). Both genes are present in species ranging from humans to yeast, and their protein products interact to form a complex composed of a beta barrel domain and an alpha helical bundle domain. It is the N-terminal and C-terminal regions of the large subunit that participate in interactions with the small subunit.
The binding of TFIID with the TATA box is followed by recruitment of another transcription factor TFIIB, which binds to the TATA-binding protein (TBP) as well as to DNA sequences that are present upstream of the TATA box in some promoters. TFIIB makes protein-protein interactions with the TBP subunit of TFIID, and the RPB1 subunit of RNA polymerase II. TFIIB serves as a bridge to RNA polymerase II, which binds to the TBP-TFIIB complex in association with a third factor, TFIIF.
TFIIF is one of the several transcription factors which is required to form the RNA polymerase II preinitiation complex. TFIIF binds to RNA polymerase II when the enzyme is already unbound to any other transcription factor, thus avoiding it from contacting DNA outside the promoter. Furthermore, TFIIF stabilizes the RNA polymerase II while it's contacting TBP and TFIIB.
TFIIE is another transcription factor which is required to form the RNA polymerase II preinitiation complex. Following recruitment of RNA polymerase II to the promoter, the binding of two additional factors i.e., TFIIE and TFIIH is required for transcription initiation. TFIIE is thought to be involved in DNA melting at the promoter: it contains a zinc ribbon motif that can bind single stranded DNA.
TFIIH is a multisubunit factor that plays two important roles in the formation of the RNA polymerase II transcription complex. It contains both helicase and kinase activity. The two subunits of TFIIH ( XPB and XPD proteins) are helicases, which unwind DNA around the initiation site. Another subunit of TFIIH has the protein kinase activity and therefore it phosphorylates repeated sequences present in the C-terminal domain of the largest subunit of RNA polymerase II. The polymerase II C-terminal domain (CTD) consists of tendem repeats (27 repeats in yeast and 52 repeats in humans) of 7 amino acids with the consensus sequence Tyr-Ser-Pro-Thr-Ser-Pro-Ser. Phosophorylation of these amino acids releases the polymerase from its association with the preinitiation complex, and leads to the recruitment of other proteins that allow the polymerase to initiate transcription and begin synthesis of a growing mRNA chain. TFIIH therefore seems to have a very important function in control of transcription elongation. Components of TFIIH (XPB and XPD proteins) are also required for DNA repair (nucleotide excision repair) and in phosphorylation of the cyclin-dependent kinase complexes regulating the cell cycle.
The promoters of many genes transcribed by RNA polymerase II contains another sequence element called an initiator sequence (Inr sequence) other than the TATA box. Some promoters contain only an Inr element but no TATA box. Many promoters that lack a TATA box but contain an Inr element also contain an additional downstream promoter element (DPE), located approximately 30 base pairs downstream of the transcription start site, functions cooperatively with the Inr sequence. Initiation of transcription at the promoters having no TATA box requires the transcription factor TFIID, which binds to the Inr and DPE sequences through its other subunits (TAFs). The binding of TAFs to these elements recruits TBP to the promoter, and TFIIB, polymerase II and additional transcription factors are recruited gradually in a manner similar to that which occurs in TATA box promoters.
Transcription factors involved with RNA polymerase I and III
Transcription of ribosomal RNA genes which are present in tendem repeats, is associated with RNA polymerase I. The promoter of ribosomal RNA genes spans about 150 base pairs just upstream of the transcription initiation site. These promoter sequences are recognized by two transcription factors, UBF and SL1, which bind cooperatively to the promoter and then recruit polymerase I to form an initiation complex.
UBF is a nucleolar phosphoprotein with both DNA binding and transactivation domains. The DNA-binding and transactivation domains of UBF overlap and dimerization through the amino-terminus is essential for the activation function of UBF. UBF activation requires the upstream control element (UCE; −156 to −107) of the rDNA promoter, whereas SL1 functions through the essential core element (−45 to +18), overlapping the start site (+1) of transcription. UBF can interact with SL1, via its highly acidic carboxy-terminal domain as well as with Polymerase I (Pol I) . UBF recruits SL1 and Pol I to the rDNA promoter, activating transcription by facilitating PIC (Pre-initiation complex) assembly.
The SL1 transcription factor is composed of four protein subunits, one of which is TBP. TBP is a common transcription factor required by all three classes of RNA polymerases. The association of TBP with ribosomal RNA genes is mediated by the binding of other proteins in the SL1 complex to the promoter, a case similar to the association of TBP with the Inr sequences of polymerse II genes that lack TATA boxes.
Promoters of the genes transcribed by RNA polymerase III, encoding small nuclear RNAs, are located upstream of the transcription start site. These promoters contain a TATA box along with a proximal sequence element, which is recognized by a multisubunit complex called the SNAP complex. Since SNAP complex can bind to the PSE on its own, it corresponds to a sequence-specific DNA binding basal transcription factor. On a basal RNA polymerase III promoter, containing both a PSE and a TATA box, SNAP complex binds cooperatively with TBP, and this effect is dependent on the amino-terminal domain of TBP.
Eukaryotic transcription factors
Eukaryotic transcription is more complex than prokaryotic transcription. Eukaryotic transcription factors contain a variety of structural motifs that interact with specific DNA sequences. Transcription factors (transcriptional activators) have a modular structure consisting of DNA binding and transcription activating domains. DNA-binding domains mediate association with specific regulatory sequences, and activation domains stimulate transcription by interacting with mediator proteins and general transcription factors as well as with co-activators that modify chromatin structure. In addition, many transcription factors occur as homo-dimers or hetero-dimers, held together by dimerization domains.
A DNA-binding domain (DBD) is an independently folded protein domain that contains at least one motif that recognizes double- or single-stranded DNA. A DBD can recognize a specific DNA sequence (a recognition sequence) or have a general affinity to DNA. Many different transcription factors in eukaryotic cells possessing DNA binding domains which are related to one another. Different types of DNA binding domains are discussed below.
The helix-turn-helix domain
The helix-turn-helix domain is characteristic of DNA binding proteins, composed of two α-helices joined by a short strand of amino acids and is found in many proteins that regulate gene expression. The helix-turn-helix motif was first recognized in prokaryotic DNA-binding proteins, including the E.coli catabolite activator protein (CAP). In these proteins, one helix (helix-3) makes most of the contacts with DNA, while the other helices (helices 1 and 2) lie across the complex to stabilize the interaction (Figure-1).
|Figure 1: The helix-turn-helix domain|
In eukaryotic cells, helix-turn-helix proteins include the homeodomain proteins, which play critical roles in the regulation of gene expression during embryonic development. The genes encoding these proteins were first discovered as developmental mutants in Drosophila. One example of which is the homeotic mutant of Drosophila called Antennapedia, legs rather than antennae grow out of the head of the fly. In the Antennapedia transcription factor of Drosophila, the helix-turn-helix domain consists of four α helices in which helices II and III are at right angles to each other and are separated by a characteristic β turn.According to the genetic analysis of these mutants by Ed Lewis in the 1940s, Drosophila contains nine homeotic genes, each of which specifies the identity of a different body segment. Molecular cloning and analysis of these genes indicated that they contain conserved sequences of 180 base pairs, called homeoboxes that encode DNA-binding domains (homeodomains) of transcription factors. In the Antennapedia transcription factor of Drosophila, the helix-turn-helix domain consists of four α helices in which helices II and III are at right angles to each other and are separated by a characteristic β turn.
Vertebrate homeobox genes are strikingly similar to their Drosophila counterparts in both structure and function, demonstrating the highly conserved roles of these transcription factors in animal development.
The zinc finger domain
The zinc finger domains contain repeats of cysteine and histidine residues that bind zinc ions and fold into looped structures that bind DNA. These domains were initially identified in the polymerase III transcription factor TFIIA but also commonly found among transcription factors that regulate polymerse II promoters, including Sp1.This domain generally exists in two forms. The C2H2 zinc finger (Figure-2) is one of the most common DNA-binding motifs in eukaryotic transcription factors. The name is derived from the sequence of repeating unit initially identified in the DNA-binding domain of transcription factor IIIA, which is required for transcription of 5S rRNA genes by RNA polymerase III. Each repeating unit has the consensus sequence (Tyr/Phe) X Cys X2-4 Cys X3 (Phe/Tyr) X5 Leu X2 His X3-4 His where X is any amino acid. Each repeating unit binds one zinc ion through the two cysteine (C) and two histidine (H) side chains. Usually, three or more C2H2 zinc fingers are required for DNA binding.
|Figure 2: The C2H2 zinc finger motif|
The steroid hormone receptors regulating gene transcription in response to hormones such as estrogen and testosterone, also contain zinc finger domains. These factors consist of homo or hetero-dimers, in which each monomer contains two C4 zinc finger (Figure-3) motifs. The two motifs are now known to fold together into a more complex conformation stabilized by zinc, which binds to DNA by the insertion of one α-helix from each monomer into successive major grooves. C2H2 zinc-finger proteins generally contain three or more repeating finger units and bind as monomers, whereas C4 zinc-finger proteins generally contain only two finger units and bind to DNA as homodimers or heterodimers.
|Figure 3: The C4 zinc finger motif|
The DNA-binding domain in the yeast Gal4 protein exhibits a third type of zinc-finger motif, known as the C6 zinc finger. Proteins of this class have the consensus sequence Cys-X2-Cys-X6-Cys-X5 – 6-Cys-X2-Cys-X6-Cys. The six cysteines bind two Zn2+ ions, folding the region into a compact globular domain. The Gal4 protein binds DNA as a homodimer in which the monomers associate through hydrophobic interactions along one face of their α-helical regions.
The DNA-binding proteins, leucine zipper and helix-loop-helix proteins, contain DNA-binding domains formed by dimerization of two polypeptide chains.
The Basic Leucine Zipper Domain (bZIP domain) is found in many DNA binding eukaryotic proteins (transcription factors). One part of the domain contains a region that mediates sequence specific DNA binding properties and the leucine zipper that is required to hold together (dimerize) two DNA binding regions (Figure-4). The DNA binding region comprises a number of basic amino acids such as arginine and lysine.
|Figure 4: The leucine zipper with basic domain dimer of a bzip protein|
Leucine zippers are found in both eukaryotic and prokaryotic regulatory proteins, but are mainly a feature of eukaryotes. The leucine zipper is the dimerization domain of the B-ZIP (basic-region leucine zipper) class of eukaryotic transcription factors. The B-ZIP family of transcription factors consists of a basic region that interacts with the major groove of a DNA molecule through hydrogen bonding, and a hydrophobic leucine zipper region that is responsible for dimerization. The leucine zipper is a left-handed parallel dimeric coiled-coil (Figure-5), a structure proposed independently by Pauling and Corey, and by Crick in 1953
The leucine zipper contains four or five leucine residues spaced at intervals of seven amino acids, in a region that is often at the C-terminal part of the DNA-binding domain. These leucines lie in an α-helical region (Figure-4) and the regular repeat of these residues forms a hydrophobic surface on one side of the α-helix with a leucine every second turn of the helix. These leucines are responsible for dimerization through interactions between the hydrophobic faces of the α-helices and a coiled coil structure results from this interaction (Figure-5).
|Figure 5: Leucine zipper (leucine residues are coloured red in the zipper) bound to DNA|
This structure is common in proteins containing amphipathic alpha helices in which hydrophobic amino acid residues are regularly spaced alternately three or four positions apart in the sequence. As a result of this characteristic spacing, the hydrophobic side chains form a stipe down one side of the alpha helix. The hydrophobic stripes make up the interacting surfaces between the alpha-helical monomers in a coiled-coil dimer. Leucine zipper regulatory proteins include c-fos and c-jun (the AP1 transcription factor), important regulators of normal development, as well as myc family members including myc, max, and mdx1.
The helix-loop-helix proteins are similar in structure with the leucine zipper, except that their dimerization domains are each formed by two helical regions separated by a loop. Hydrophobic residues on one side of the C-terminal α-helix allow dimerization. In general, transcription factors including this domain are dimeric, each with one helix containing basic amino acid residues that facilitate DNA binding. In general, one helix is smaller, and, due to the flexibility of the loop, allows dimerization by folding and packing against another helix. The larger helix typically contains the DNA-binding regions (Figure-6). This structure is found in the MyoD family of proteins. MyoD was identified as a gene to regulate gene expression in cell determination, commanding cells to form muscle. MyoD protein has been shown to activate muscle-specific gene expression directly. Four genes,myoD,myogenin, myf5 and mrf4 have been shown to have the ability to convert fibroblasts into muscle. The encoded proteins are all members of the helix-loop-helix transcription factor family.
|Figure 6: The helix-loop-helix domain|
Both leucine zipper and helix-loop-helix proteins play important roles in regulating tissue-specific and inducible gene expression, and the formation of dimers between different members of these families is a critical aspect of the control of their function.
Transcription activation domains
Transcriptional activation domains (TADs) are regions of a transcription factor which in conjunction with a DNA binding domain can activate transcription from a promoter by contacting transcriptional machinery (general transcription factors + RNA Polymerase) either directly or through other proteins known as co-activators.
To understand the mechanism of functioning of transcription activation domains, certain transcription factors are used as model proteins, which include GAL4, GCN4, HAP1 etc., in yeast cells, Steroid hormone receptors, heat shock transcription factors, NFKB etc., in mammalian cells, and viral proteins such as herpes virus activator VP16, HIV TAT etc.
Acidic activation domains
Comparison of the transactivation domains of yeast Gcn4 and gal4, mammalian glucocorticoid receptor and herpes virus activator VP16 shows that they have a very high proportion of acidic amino acids. These have been called acidic activation domains and are characteristic of many transcription activation domains.
Glutamine rich domains were first identified in two activation regions of the transcription factor Sp1. These glutamine rich motifs are essential for the activation of trasncription mediated by these domains since their deletion abolishes the ability to activate transcription. However, transcriptional activation can be restored by substituting the glutamine-rich regions of Sp1 with a glutamine rich region from thr Drosophila homeobox transcription factor Antennapedia which has no sequence homology to the Sp1 sequence. So as with the acidic activation domains, the activating ability of a glutamine-rich domain is not defined by its primary sequence but rather by its overall nature in being glutamine-rich.
Other than the transcription factors Sp1 and Antennapedia, similar glutamine-rich regions have been defined in the N-terminal activation domains of the octamer binding proteins Oct-1 and Oct-2, the Drosophila homeobox proteins ultra-bithorax and zeste and the yeast HAP1 and HAP2 transcription factors.
Proline-rich domains have been identified in several transcription factors. As with the other classes of activation domains, this region is capable of activating transcription when linked to the DNA binding domains of other transcription factors. As with glutamine, a continuous run of proline residues can mediate activation, indicating that the function of this type of domain depends primarily on its richness in proline.
A proline rich domain is seen in the activator CTF-1. It has a domain of 84 amino acids, of which 19 amino acids are prolines. CTF-1 is a member of a class of transcription factors that bind to an extended promoter element called a CCAAT box. The N-terminal domain has been shown to regulate transcription of certain genes. The C-terminal end is a transcription regulator and is known to bind to histone proteins via the proline repeats. The proline-rich domain is also found in other transcription factors such as the oncogene product Jun, AP2, and the C-terminal activation domain of Oct-2. Thus, as with the glutamine-rich domains, proline-rich domains are not confined to a single factor, while a single factor such as Oct-2 can contain two activation domains of different types.
Gene expression in eukaryotic cells is regulated by repressors as well as by transcriptional activators. Repressors bind to specific DNA sequences and inhibit transcription or may simply interfere with the binding of other transcription factors to DNA. Like activators, many eukaryotic repressors have two functional domains: a DNA-binding domain and a repression domain. As is true for activation domains, a variety of amino acid sequences can function as repression domains. Many of these are relatively short (≈20 amino acids) and contain high proportions of hydrophobic residues. Other repression domains contain a high proportion of basic residues. In some cases, repression domains are larger, well-structured protein domains.Some repressors contain the same DNA-binding domain as the activator but lack its activation domain. As a result, their binding to a promoter or enhancer blocks the binding of the activator, thereby inhibiting transcription.
Some repressors contain specific functional domains that inhibit transcription via protein-protein interactions. Molecular analysis of the gene called Kruppel, involved in embryonic development in Drosophila, demonstrated that it contains a discrete repression domain, which is linked to a zinc finger DNA-binding domain. The Kruppel repression domain could be interchanged with distinct DNA-binding domains of other transcription factors. Many active repressors serve as critical regulators of cell growth and differentiation. The repression domain of Kruppel is rich in alanine residues, whereas other repression domains are rich in proline or acidic residues. The functional targets of repressors are also diverse, repressors can inhibit transcription by interacting with specific activator proteins, with mediator proteins or general transcription factors, and with corepressors that act by modifying chromatin structure. Geneticists have identified mutations in yeast that result in constitutive expression of certain genes, indicating that these genes normally are regulated by a repressor. While mutation of an activator-binding site leads to decreased expression of the linked reporter gene, mutation of a repressor-binding site leads to increased expression of a reporter gene. Repressor proteins that bind such sites have been purified and characterized using sequence-specific DNA affinity chromatography, as for activator proteins.
The protein encoded by the Wilms’ tumor (WT1) gene is a repressor that is expressed preferentially in the developing kidney. Children who inherit mutations in both the maternal and paternal WT1 genes, so that they produce no functional WT1 protein, invariably develop kidney tumors early in life. The WT1 protein, which has a C2H2 zinc-finger DNA- binding domain, binds to the control region of the gene encoding a transcription activator called EGR-1. This gene, like many other eukaryotic genes, is subject to both repression and activation. Binding of WT1 represses transcription of the EGR-1 gene without inhibiting binding of the two activators that normally stimulate expression of this gene.
The regulation of transcription by repressors as well as by activators considerably extends the range of mechanisms that control the expression of eukaryotic genes.
1. Cooper, Geoffrey M., Hausman Robert E.(2004). RNA synthesis and processing. The cell: a molecular approach, 3rd Edn. Sinauer Associates, Inc.
2. Turner Phil, McLennan Alexander, Bates Andy and White Mike (2005). Eukaryotic transcription factors. Instant notes (228-232), Molecular Biology. Taylor& Francis Group.
3. www. ncbi.nlm.nih.gov