Mihaela Zavolan

Research Focus

Our group is studying several aspects of RNA biology using computational means. To enable analyses of splice variation, we have developed a fully-automated software tool that starts from large scale sequence data sets (such as Unigene) and performs all the steps that are necessary to generate a web-accessible database of all the splice forms observed in the sequence data.

An essential component of this tool is a novel algorithm that we developed for mapping cDNA and EST sequences to their corresponding genome [6]. The algorithm integrates information about gene structure, splice sites and sequencing errors in a Bayesian probabilistic framework to infer the most likely mapping of a cDNA sequence to the genome.

We are using the resulting data about splice variants to study the mechanisms regulating alternative splicing. We found for example that the exons that are included in some transcripts and skipped in others differ from constitutive exons in several respects. Most notably, their length distribution is wider, their splice sites “weaker” and they have lower frequency of several known splice enhancer motifs [9].

We also found that a simple model that describes the binding specificity of the spliceosome to the splice sites can explain the frequent occurrence of small changes in the location of splice sites [5]. This indicates that noise, manifested in the stochastic binding of the spliceosome to neighbouring, competing splice sites, can explain much of the splice variation inferred from sequence databases. On the other hand, we found that for many exons that can be either included or skipped the inclusion is correlated with the choice of transcription start site [1].

Currently, we develop computational methods for the analysis of high-throughput short RNA sequencing data. We applied these methods mostly to the identification of novel regulatory RNAs [2,3,4 and www.mirz.unibas.ch/smiRNA-annotation], but other groups used short RNA sequencing techniques to identify binding sites for transcription factors and RNA-binding proteins, promoter regions and alternative splice forms.


  1. Poy, M.N., Hausser, J., Trajkovski, M., Braun, M., Collins, S., Rorsman, P., Zavolan, M., Stoffel, M. (2009). miR-375 maintains normal pancreatic {alpha}- and {beta}-cell mass. Proceedings of the National Academy of Sciences USA 2009 Mar 16. [epub ahead of print]
  2. Chern, T.M., Paul, N., van Nimwegen, E., Zavolan, M. (2008). Computational analysis of small RNA cloning data. DNA Research (doi:10.1093/dnares/dsm036).
  3. Berninger, P., Gaidatzis, D., van Nimwegen, E., Zavolan M. (2008). Computational analysis of small RNA cloning data. Methods 44, 13-21.
  4. Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., et al. (2007). A mammalian microRNA expression atlas based on small RNA library sequencing. Cell 129, 1401-1414.
  5. Aravin, A., Gaidatzis, D., Pfeffer, S., Lagos-Quintana, M., Landgraf, P., Iovino, N., Morris, P., Brownstein, M.J., Kuramochi-Miyagawa, S., Nakano, T., Chien, M., Russo, J.J., Ju, J., Sheridan, R., Sander, C., Zavolan, M., Tuschl, T. (2006). A novel class of small RNAs bind to MILI protein in mouse testes. Nature 442, 203-207.
  6. Chern, T.M., van Nimwegen, E., Kai, C., Kawai, J., Carninci, P., Hayashizaki, Y., Zavolan, M. (2006). A simple physical model predicts small exon length variations. PLoS Genetics 2, e45.
  7. van Nimwegen, E., Paul, N., Sheridan, R., Zavolan, M. (2006). Spa: a probabilistic algorithm for spliced alignment. PLoS Genetics 2(4), e24.
  8. Carninci, P., Kasukawa, T., Katayama, S., et al. (2005). The transcriptional landscape of the mammalian genome. Science 309, 1559-1563.
  9. Ravasi, T., Huber, T., Zavolan, M., et al. (2003). Systematic characterization of the zinc-finger-containing proteins in the mouse transcriptome. Genome Research 13, 1430-1432.
  10. Zavolan, M., Kondo, S., Schoenbach, C. et al. (2003) Impact of alternative initiation, splicing, and termination on the diversity of the mRNA transcripts encoded by the mouse transcriptome. Genome Research 13, 1290-1300.
  11. Zavolan, M., van Nimwegen, E., Gaasterland, T. (2002) Splice variation of mouse full-length cDNAs identified by mapping to the mouse genome. Genome Research 12, 1377-1385.

Key lab techniques: algorithms for spliced alignment and inference of splice variants from sequencing data, analysis of high-throughput short RNA sequencing data, non-coding RNA gene prediction, discovery of regulatory motifs, target prediction for small regulatory RNAs.

Interest in alternative splicing: Alternative splicing is one of main mechanisms that contributes to the complexity of eukaryotes. While much is known about the mechanism of splicing, the factors that contribute to the expression of a specific splice form at the particular time, in a particular cell is largely unknown. The availability of large data sets of cDNA and EST sequences as well as of complete genomes from various eukaryotes enabled us and others to uncover signals that contribute to the regulation of alternative splicing through computational analyses.

Lab contact: Yvonne Steger: Yvonne.steger@unibas.ch

Lab website: www.biozentrum.unibas.ch/zavolan/index.html