Helmholtz Gemeinschaft

Search
Browse
Statistics
Feeds

ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data

[img]
Preview
PDF (Article) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
3MB
[img] Other (Supplementary Data)
20MB

Item Type:Article
Title:ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data
Creators Name:Heller, D. and Krestel, R. and Ohler, U. and Vingron, M. and Marsico, A.
Abstract:RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or employ models which are not directly interpretable as sequence-structure motifs. We developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and Gibbs sampling which fully captures the relationship between RNA sequence and secondary structure preference of a given RBP. Compared to previous methods which output separate logos for sequence and structure, it directly produces a combined sequence-structure motif when trained on a large set of sequences. ssHMM's model is visualized intuitively as a graph and facilitates biological interpretation. ssHMM can be used to find novel bona fide sequence-structure motifs of uncharacterized RBPs, such as the one presented here for the YY1 protein. ssHMM reaches a high motif recovery rate on synthetic data, it recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input size, being considerably faster than MEMERIS and RNAcontext on large datasets while being on par with GraphProt. It is freely available on Github and as a Docker image.
Keywords:Algorithms, Base Sequence, Computational Biology, Models, Molecular, Nucleic Acid Conformation, Nucleotide Motifs, Protein Binding, Protein Domains, RNA, RNA-Binding Proteins, Reproducibility of Results, Sequence Analysis, RNA
Source:Nucleic Acids Research
ISSN:0305-1048
Publisher:Oxford University Press
Volume:45
Number:19
Page Range:11004-11018
Date:2 November 2017
Official Publication:https://doi.org/10.1093/nar/gkx756
PubMed:View item in PubMed
Related to:
URLURL Type
https://edoc.mdc-berlin.de/16394/Preprint version

Repository Staff Only: item control page

Downloads

Downloads per month over past year

Open Access
MDC Library