Helmholtz Gemeinschaft


ssHMM: Extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data

Item Type:Preprint
Title:ssHMM: Extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data
Creators Name:Heller, D. and Krestel, R. and Ohler, U. and Vingron, M. and Marsico, A.
Abstract:RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. To which extent RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or produce models which are not directly interpretable as sequence-structure motifs. Thus, a tool which produces informative motifs and at the same time captures the relationship between RNA primary sequence and secondary structure is missing. We developed ssHMM, an RNA motif finder that combines a hidden Markov model (HMM) with Gibbs sampling to learn the joint sequence and structure binding preferences of RBPs from high-throughput data, such as CLIP-Seq sequences, and intuitively visualizes them as a graph. Evaluations on synthetic data showed that ssHMM reliably recovers fuzzy sequence motifs in 80 to 100% of the cases, outperforming state-of-the-art methods designed for a similar task. On real data, it produces motifs with higher information content than existing tools. Additionally, ssHMM is considerably faster than other methods on large data sets. We also discuss examples of novel sequence-structure motifs for uncharacterized RBPs which could be identified by ssHMM. ssHMM is freely available on Github at https://github. molgen.mpg.de/heller/ssHMM .
Publisher:Cold Spring Harbor Laboratory Press
Article Number:076034
Date:11 October 2016
Official Publication:https://doi.org/10.1101/076034
Related to:
https://edoc.mdc-berlin.de/16805/Final version

Repository Staff Only: item control page

Open Access
MDC Library