Bioinformatic approaches to finding cis acting regulatory motifs in eukaryotic mRNAs - focusing on human 3' UTR mRNA analysis

This is a brief introduction related to the TransTerm databases ( If you find this useful in your research. please cite our publication in the database issue of Nucleic Acids Research. Other methods are described in the references (1-5) and related web sites (6-12). We provide a selection of tools via the www interface to access the TransTerm data (13)

A list of related tools and resources for analyzing mRNA and UTRs is also available here.

Classification based on location in the mRNA

Many motifs are located in the 5' or 3' UTRs of mRNA sequences. They have been found less commonly in coding sequences.

What do motifs do? Classification based on function

Motifs in particular mRNAs and translated viral RNAs have been shown to be involved in mediating many functions and post-transcriptional controls in cells. These include (with recent selected references):

Detailed examples of some of these motifs can be found by using "Describe Transterm motifs" in the pull down menu, choosing a pattern.

A sequence and structural classification

Motifs can be classified into three broad classes based on structure

  1. Sequence alone
  2. Structure alone
  3. A combination of sequence and structure

How can I find these types of motifs computationally?

These classes of motifs reflect the different ways in which they interact with other RNAs, RNA-binding proteins or ribosomes. The different classes require different methods for computational recognition. Two types of questions are often asked: "How can I find known elements in my sequence?" or, "Given a group of related sequences how can I find common elements?"

1. Sequence alone

Motifs can vary greatly in size, although many are small ~4-8 bases long, and may repeat in the sequence e.g. ARE stability elements.

A single mRNA. These may be recognised by RNA binding proteins or by other RNAs. Known motifs can be identified computationally using consensus sequences, consensus matrices and statistical models of motifs. The first two are provided at this site. More sophisticated methods are available, but these will usually require implementing the programs at your site. Examples are the common AU Rich Elements (ARE, repeating core motifs of AUUUA), or rare Nanos Response Elements (NRE, repeating motifs of UUGU). Although superficially similar, these motifs are recognised by different classes of proteins. Furthermore the function of such motifs may be determined by the binding of secondary ligand(s). Thus a destabilising element in one cell may stabilise it in another.

Finding motifs using regular expressions. Scan for matches (Patscan) is provided here to search the Transterm datasets for known or putative motifs (60).

Finding motifs using BLAST. Blast is not well designed to finding these motifs (61). However, we provide an online BLAST search with parameters such that it is able to find longer degenerate motifs (e.g. AUUUAUUUAUUUA).

Aligned or unaligned related sequences. Methods involving local alignments are usually utilised to find small motifs or structures. Reviewed for DNA motifs, particularly transcription factor binding sites in (56,62-65). Methods that attempt to find global alignments, e.g. ClustalW or pileup are not so successful, although they will find longer motifs.

2. Structure alone

A single mRNA. Known motifs may be described and searched for at this site using user-defined base pairing rules. Methods involving energy minimisation, utilising thermodynamic parameters are available (66,67). However, the theoretically most stable structure may not the physiological motif, as other proteins, RNAs and complexes binding the mRNA will affect structure. Induced fit has been demonstrated in RNA-protein recognition (59,68). In some cases simplification of the structure may assist analysis (69,70).

It should also be recognised that unusual base-pairing may, in some cases contribute to unusual structures, for example A-G base pairs in the SECIS element (71). These unusual base pairs, and the more common U-U and G-G base pairs, will not be favoured by thermodynamic computational approaches. Unusual base pairs or pinched out bases may provide discrimination between similar structural motifs.

Aligned or unaligned related sequences. By definition, it is difficult to make a multiple alignment of sequences with only conservation in structure. However, new methods for recognition of structural motifs in unaligned sequences have recently become available {Bradley, 2008 #18638; Xu, 2007 #12944}.

3. A combination of sequence and structure

A single mRNA. Known motifs may be described and searched for at this site using user-defined base pairing rules and consensus methods. For example the well characterised Iron Responsive Element (IRE) {Leipuviene, 2007 #13526;Pavesi, 2004 #8650}.

Aligned or unaligned related sequences. Few methods currently exist to combine the approaches described above. Utilisation of both sequence and structural recognition elements may allow the discovery of such motifs (72,73).

How do I know if this match is significant?

This is perhaps the most difficult question. It is possible to apply statistical methods to determine how often a sequence motif is expected to occur by chance in a particular database. Small motifs will give many false positives. When ascertaining significance it is essential to take into account the expected composition of the bases in similar regions of the genome in question. Usually at least dinucleotide bias is taken into account.

In addition searching for motifs in regions of similar composition where they are known not to function can give an estimate of the false positive rate. For most patterns described in TransTerm we give an estimate of the number of hits in a typical mRNA database.

Motifs in coding sequences

Much of a mRNA sequence encodes protein and is thus constrained (59), motifs in the 5' or 3' UTRs have been easier to identify (74,75). However, coding region motifs have previously been discovered experimentally (76,77). Computational methods to discover regulatory elements within coding are now becoming feasible, utilising comparative genomics (42,45,78-80).

References and further reading

1. Hendrickson, D.G., Hogan, D.J., Herschlag, D., Ferrell, J.E. and Brown, P.O. (2008) Systematic identification of mRNAs recruited to argonaute 2 by specific microRNAs and corresponding changes in transcript abundance. PLoS ONE, 3, e2126. PubMed

2. Elemento, O., Slonim, N. and Tavazoie, S. (2007) A universal framework for regulatory element discovery across all genomes and data types. Mol Cell, 28, 337-350. PubMed

3. Gardner, P.P., Wilm, A. and Washietl, S. (2005) A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res, 33, 2433-2439. PubMed

4. Bartel, D.P. (2009) MicroRNAs: target recognition and regulatory functions. Cell, 136, 215-233. PubMed

5. Gruber, A.R., Bernhart, S.H., Hofacker, I.L. and Washietl, S. (2008) Strategies for measuring evolutionary conservation of RNA secondary structures. BMC Bioinformatics, 9, 122. PubMed

6. Mignone, F., Grillo, G., Licciulli, F., Iacono, M., Liuni, S., Kersey, P.J., Duarte, J., Saccone, C. and Pesole, G. (2005) UTRdb and UTRsite: a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs. Nucleic Acids Res, 33, D141-146. PubMed

7. Gardner, P.P., Daub, J., Tate, J.G., Nawrocki, E.P., Kolbe, D.L., Lindgreen, S., Wilkinson, A.C., Finn, R.D., Griffiths-Jones, S., Eddy, S.R. et al. (2009) Rfam: updates to the RNA families database. Nucleic Acids Res, 37, D136-140. PubMed

8. Siebert, S. and Backofen, R. (2005) MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons. Bioinformatics, 21, 3352-3359. PubMed

9. Kin, T., Yamada, K., Terai, G., Okida, H., Yoshinari, Y., Ono, Y., Kojima, A., Kimura, Y., Komori, T. and Asai, K. (2007) fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences. Nucleic Acids Res, 35, D145-148. PubMed

10. Huang, H.Y., Chien, C.H., Jen, K.H. and Huang, H.D. (2006) RegRNA: an integrated web server for identifying regulatory RNA motifs and elements. Nucleic Acids Res, 34, W429-434. PubMed

11. Griffiths-Jones, S., Grocock, R.J., van Dongen, S., Bateman, A. and Enright, A.J. (2006) miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res, 34, D140-144. PubMed

12. Gruber, A.R., Lorenz, R., Bernhart, S.H., Neubock, R. and Hofacker, I.L. (2008) The Vienna RNA Websuite. Nucleic Acids Res. PubMed

13. Jacobs, G.H., Chen, A., Stevens, S.G., Stockwell, P.A., Black, M.A., Tate, W.P. and Brown, C.M. (2009) Transterm: a database to aid the analysis of regulatory sequences in mRNAs. Nucleic Acids Res, 37, D72-76. PubMed

14. Mili, S. and Macara, I.G. (2009) RNA localization and polarity: from A(PC) to Z(BP). Trends Cell Biol, 19, 156-164. PubMed

15. Jambhekar, A. and Derisi, J.L. (2007) Cis-acting determinants of asymmetric, cytoplasmic RNA transport. RNA, 13, 625-642. PubMed

16. Martin, K.C. and Ephrussi, A. (2009) mRNA localization: gene expression in the spatial dimension. Cell, 136, 719-730. PubMed

17. Bakheet, T., Frevel, M., Williams, B.R., Greer, W. and Khabar, K.S. (2001) ARED: human AU-rich element-containing mRNA database reveals an unexpectedly diverse functional repertoire of encoded proteins. Nucleic Acids Res, 29, 246-254. PubMed

18. Wang, J., Pitarque, M. and Ingelman-Sundberg, M. (2006) 3'-UTR polymorphism in the human CYP2A6 gene affects mRNA stability and enzyme expression. Biochem Biophys Res Commun, 340, 491-497. PubMed

19. Xu, Y.Z., Di Marco, S., Gallouzi, I., Rola-Pleszczynski, M. and Radzioch, D. (2005) RNA-binding protein HuR is required for stabilization of SLC11A1 mRNA and SLC11A1 protein expression. Mol Cell Biol, 25, 8139-8149. PubMed

20. Barreau, C., Paillard, L. and Osborne, H.B. (2005) AU-rich elements and associated factors: are there unifying principles? Nucleic Acids Res, 33, 7138-7150. PubMed

21. Jing, Q., Huang, S., Guth, S., Zarubin, T., Motoyama, A., Chen, J., Di Padova, F., Lin, S.C., Gram, H. and Han, J. (2005) Involvement of microRNA in AU-rich element-mediated mRNA instability. Cell, 120, 623-634. PubMed

22. Coller, J. and Parker, R. (2005) General translational repression by activators of mRNA decapping. Cell, 122, 875-886. PubMed

23. Ostareck, D.H., Ostareck-Lederer, A., Shatsky, I.N. and Hentze, M.W. (2001) Lipoxygenase mRNA silencing in erythroid differentiation: The 3'UTR regulatory complex controls 60S ribosomal subunit joining. Cell, 104, 281-290. PubMed

24. Dean, K.A., Aggarwal, A.K. and Wharton, R.P. (2002) Translational repressors in Drosophila. Trends Genet, 18, 572-577. PubMed

25. Cok, S.J. and Morrison, A.R. (2001) The 3'-untranslated region of murine cyclooxygenase-2 contains multiple regulatory elements that alter message stability and translational efficiency. J Biol Chem, 9, 9. PubMed

26. Dreher, T.W. and Miller, W.A. (2006) Translational control in positive strand RNA plant viruses. Virology, 344, 185-197. PubMed

27. Oh, B., Hwang, S.Y., McLaughlin, J., Solter, D. and Knowles, B.B. (2000) Timely translation during the mouse oocyte-to-embryo transition. Development, 127, 3795-3803. PubMed

28. Crucs, S., Chatterjee, S. and Gavis, E.R. (2000) Overlapping but distinct RNA elements control repression and activation of nanos translation. Molecular Cell, 5, 457-467. PubMed

29. Milligan, L., Torchet, C., Allmang, C., Shipman, T. and Tollervey, D. (2005) A nuclear surveillance pathway for mRNAs with defective polyadenylation. Mol Cell Biol, 25, 9996-10004. PubMed

30. Prasanth, K.V., Prasanth, S.G., Xuan, Z., Hearn, S., Freier, S.M., Bennett, C.F., Zhang, M.Q. and Spector, D.L. (2005) Regulating gene expression through RNA nuclear retention. Cell, 123, 249-263. PubMed

31. Jackson, R.J. (2005) Alternative mechanisms of initiating translation of mammalian mRNAs. Biochem Soc Trans, 33, 1231-1241. PubMed

32. Pilipenko, E.V., Viktorova, E.G., Guest, S.T., Agol, V.I. and Roos, R.P. (2001) Cell-specific proteins regulate viral RNA translation and virus-induced disease. Embo J, 20, 6899-6908. PubMed

33. Kwok, L.W., Shcherbakova, I., Lamb, J.S., Park, H.Y., Andresen, K., Smith, H., Brenowitz, M. and Pollack, L. (2006) Concordant exploration of the kinetics of RNA folding from global and local perspectives. J Mol Biol, 355, 282-293. PubMed

34. Mokrejs, M., Vopalensky, V., Kolenaty, O., Masek, T., Feketova, Z., Sekyrova, P., Skaloudova, B., Kriz, V. and Pospisek, M. (2006) IRESite: the database of experimentally verified IRES structures ( Nucleic Acids Res, 34, D125-130. PubMed

35. Herr, A.J., Atkins, J.F. and Gesteland, R.F. (2000) Coupling of open reading frames by translational bypassing. Annu Rev Biochem, 69, 343-372. PubMed

36. Yu, E.T., Zhang, Q. and Fabris, D. (2005) Untying the FIV frameshifting pseudoknot structure by MS3D. J Mol Biol, 345, 69-80. PubMed

37. Baranov, P.V., Gurvich, O.L., Hammer, A.W., Gesteland, R.F. and Atkins, J.F. (2003) Recode 2003. Nucleic Acids Res, 31, 87-89. PubMed

38. Baranov, P.V., Gurvich, O.L., Fayet, O., Prere, M.F., Miller, W.A., Gesteland, R.F., Atkins, J.F. and Giddings, M.C. (2001) RECODE: a database of frameshifting, bypassing and codon redefinition utilized for gene expression. Nucleic Acids Res, 29, 264-267. PubMed

39. Baranov, P.V., Fayet, O., Hendrix, R.W. and Atkins, J.F. (2006) Recoding in bacteriophages and bacterial IS elements. Trends Genet, 22, 174-181. PubMed

40. Ivanov, I.P., Gesteland, R.F. and Atkins, J.F. (2006) Evolutionary specialization of recoding: Frameshifting in the expression of S. cerevisiae antizyme mRNA is via an atypical antizyme shift site but is still +1. RNA, 12, 332-337. PubMed

41. Tork, S., Hatin, I., Rousset, J.P. and Fabret, C. (2004) The major 5' determinant in stop codon read-through involves two adjacent adenines. Nucleic Acids Res, 32, 415-421. PubMed

42. Firth, A.E. and Brown, C.M. (2006) Detecting overlapping coding sequences in virus genomes. BMC Bioinformatics, 7, 75. PubMed

43. Harrell, L., Melcher, U. and Atkins, J.F. (2002) Predominance of six different hexanucleotide recoding signals 3' of read-through stop codons. Nucleic Acids Res, 30, 2011-2017. PubMed

44. Castellano, S., Gladyshev, V.N., Guigo, R. and Berry, M.J. (2008) SelenoDB 1.0 : a database of selenoprotein genes, proteins and SECIS elements. Nucleic Acids Res, 36, D332-338. PubMed

45. Panjaworayan, N., Roessner, S., Firth, A. and Brown, C. (2007) HBVRegDB: Annotation, comparison, detection and visualization of regulatory elements in hepatitis B virus sequences. Virology J, 4, 136. PubMed

46. Cartegni, L., Wang, J., Zhu, Z., Zhang, M.Q. and Krainer, A.R. (2003) ESEfinder: A web resource to identify exonic splicing enhancers. Nucleic Acids Res, 31, 3568-3571. PubMed

47. Xu, D.Q. and Mattox, W. (2006) Identification of a splicing enhancer in MLH1 using COMPARE, a new assay for determination of relative RNA splicing efficiencies. Human molecular genetics, 15, 329-336. PubMed

48. Wu, M., Reuter, M., Lilie, H., Liu, Y., Wahle, E. and Song, H. (2005) Structural insight into poly(A) binding and catalytic mechanism of human PARN. Embo J, 24, 4082-4093. PubMed

49. Zatkova, A., Messiaen, L., Vandenbroucke, I., Wieser, R., Fonatsch, C., Krainer, A.R. and Wimmer, K. (2004) Disruption of exonic splicing enhancer elements is the principal cause of exon skipping associated with seven nonsense or missense alleles of NF1. Hum Mutat, 24, 491-501. PubMed

50. Wang, J., Smith, P.J., Krainer, A.R. and Zhang, M.Q. (2005) Distribution of SR protein exonic splicing enhancer motifs in human protein-coding genes. Nucleic Acids Res, 33, 5053-5062. PubMed

51. Chung, B.Y., Simons, C., Firth, A.E., Brown, C.M. and Hellens, R.P. (2006) Effect of 5'UTR introns on gene expression in Arabidopsis thaliana. BMC Genomics, 7, 120. PubMed

52. Perkins, D.O., Jeffries, C. and Sullivan, P. (2005) Expanding the 'central dogma': the regulatory role of nonprotein coding genes and implications for the genetic liability to schizophrenia. Mol Psychiatry, 10, 69-78. PubMed

53. Rusinov, V., Baev, V., Minkov, I.N. and Tabler, M. (2005) MicroInspector: a web tool for detection of miRNA binding sites in an RNA sequence. Nucleic Acids Res, 33, W696-700. PubMed

54. Zhang, Y. (2005) miRU: an automated plant miRNA target prediction server. Nucleic Acids Res, 33, W701-704. PubMed

55. Hsu, P.W., Huang, H.D., Hsu, S.D., Lin, L.Z., Tsou, A.P., Tseng, C.P., Stadler, P.F., Washietl, S. and Hofacker, I.L. (2006) miRNAMap: genomic maps of microRNA genes and their target genes in mammalian genomes. Nucleic Acids Res, 34, D135-139. PubMed

56. Xie, X., Lu, J., Kulbokas, E.J., Golub, T.R., Mootha, V., Lindblad-Toh, K., Lander, E.S. and Kellis, M. (2005) Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Nature, 434, 338-345. PubMed

57. Chuzhanova, N., Cooper, D.N., Ferec, C. and Chen, J.-M. (2007) Searching for potential microRNA-binding site mutations amongst known disease-associated 3' UTR variants Genomic Med, 1, 2933 PubMed

58. Georges, M., Coppieters, W. and Charlier, C. (2007) Polymorphic miRNA-mediated gene regulation: contribution to phenotypic variation and disease. Curr Opin Genet Dev, 17, 166-176. PubMed

59. Meyer, I.M. and Miklos, I. (2005) Statistical evidence for conserved, local secondary structure in the coding regions of eukaryotic mRNAs and pre-mRNAs. Nucleic Acids Res, 33, 6338-6348. PubMed

60. Overbeek, R. (1997) PatScan., [5th May 1997].

61. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 25, 3389-3402. PubMed

62. Elnitski, L., Jin, V.X., Farnham, P.J. and Jones, S.J. (2006) Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. Genome Res, 16, 1455-1464. PubMed

63. Tompa, M., Li, N., Bailey, T.L., Church, G.M., De Moor, B., Eskin, E., Favorov, A.V., Frith, M.C., Fu, Y., Kent, W.J. et al. (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol, 23, 137-144. PubMed

64. Abnizova, I., Walter, K., Te Boekhorst, R., Elgar, G. and Gilks, W.R. (2007) Statistical information characterization of conserved non-coding elements in vertebrates. J Bioinform Comput Biol, 5, 533-547. PubMed

65. Li, N. and Tompa, M. (2006) Analysis of computational approaches for motif discovery. Algorithms Mol Biol, 1, 8. PubMed

66. Zuker, M. and Jacobson, A.B. (1998) Using reliability information to annotate RNA secondary structures. RNA, 4, 669-679. PubMed

67. Hofacker, I.L. (2003) Vienna RNA secondary structure server. Nucleic Acids Res, 31, 3429-3431. PubMed

68. Williamson, J.R. (2001) Proteins that bind RNA and the labs who love them. Nat Struct Biol, 8, 390-391. PubMed

69. Reeder, J. and Giegerich, R. (2005) Consensus shapes: an alternative to the Sankoff algorithm for RNA consensus structure prediction. Bioinformatics, 21, 3516-3523. PubMed

70. Janssen, S., Reeder, J. and Giegerich, R. (2008) Shape based indexing for faster search of RNA family databases. BMC Bioinformatics, 9, 131. PubMed

71. Busch, A., Will, S. and Backofen, R. (2005) SECISDesign: a server to design SECIS-elements within the coding sequence. Bioinformatics, 21, 3312-3313. PubMed

72. Washietl, S., Hofacker, I.L., Lukasser, M., Huttenhofer, A. and Stadler, P.F. (2005) Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat Biotechnol, 23, 1383-1390. PubMed

73. Hofacker, I.L., Fekete, M. and Stadler, P.F. (2002) Secondary structure prediction for aligned RNA sequences. J Mol Biol, 319, 1059-1066. PubMed

74. Winkler, W.C. and Breaker, R.R. (2005) Regulation of bacterial gene expression by riboswitches. Annu Rev Microbiol, 59, 487-517. PubMed

75. Shabalina, S.A., Ogurtsov, A.Y., Rogozin, I.B., Koonin, E.V. and Lipman, D.J. (2004) Comparative analysis of orthologous eukaryotic mRNAs: potential hidden functional signals. Nucleic Acids Res, 32, 1774-1782. PubMed

76. Lemm, I. and Ross, J. (2002) Regulation of c-myc mRNA decay by translational pausing in a coding region instability determinant. Mol Cell Biol, 22, 3959-3969. PubMed

77. Ioannidis, P., Mahaira, L., Papadopoulou, A., Teixeira, M.R., Heim, S., Andersen, J.A., Evangelou, E., Dafni, U., Pandis, N. and Trangas, T. (2003) CRD-BP: a c-Myc mRNA stabilizing protein with an oncofetal pattern of expression. Anticancer Res, 23, 2179-2183. PubMed

78. Firth, A.E. and Brown, C.M. (2005) Detecting overlapping coding sequences with pairwise alignments. Bioinformatics, 21, 282-292. PubMed

79. Seemann, S.E., Gorodkin, J. and Backofen, R. (2008) Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments. Nucleic Acids Res, 36, 6355-6362. PubMed

80. Satija, R., Pachter, L. and Hein, J. (2008) Combining Statistical Alignment and Phylogenetic Footprinting to Detect Regulatory Elements. Bioinformatics. PubMed