Login

Search

Advanced Search

Browse

Research Area
Research Team (MDC)
Research Team (ECRC)
Journal Title
Year

Statistics

Latest Additions
High Impact Papers
Downloads

Feeds

Atom

RSS 1.0

RSS 2.0

Benchmarking the next generation of homology inference tools

Tools

[thumbnail of 17323oa.pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
679kB

Item Type:	Article
Title:	Benchmarking the next generation of homology inference tools
Creators Name:	Saripella, G.V., Sonnhammer, E.L.L. and Forslund, K.
Abstract:	Motivation: Over the last decades, vast numbers of sequences were deposited in public databases. Bioinformatics tools allow homology and consequently functional inference for these sequences. New profile-based homology search tools have been introduced, allowing reliable detection of remote homologs, but have not been systematically benchmarked. To provide such a comparison, which can guide bioinformatics workflows, we extend and apply our previously developed benchmark approach to evaluate the 'next generation' of profile-based approaches, including CS-BLAST, HHSEARCH and PHMMER, in comparison with the non-profile based search tools NCBI-BLAST, USEARCH, UBLAST and FASTA. Method: We generated challenging benchmark datasets based on protein domain architectures within either the PFAM + Clan, SCOP/Superfamily or CATH/Gene3D domain definition schemes. From each dataset, homologous and non-homologous protein pairs were aligned using each tool, and standard performance metrics calculated. We further measured congruence of domain architecture assignments in the three domain databases. Results: CSBLAST and PHMMER had overall highest accuracy. FASTA, UBLAST and USEARCH showed large trade-offs of accuracy for speed optimization. Conclusion: Profile methods are superior at inferring remote homologs but the difference in accuracy between methods is relatively small. PHMMER and CSBLAST stand out with the highest accuracy, yet still at a reasonable computational cost. Additionally, we show that less than 0.1% of Swiss-Prot protein pairs considered homologous by one database are considered non-homologous by another, implying that these classifications represent equivalent underlying biological phenomena, differing mostly in coverage and granularity. Availability and Implementation: Benchmark datasets and all scripts are placed at (http://sonnhammer.org/download/Homology_benchmark). Contact: forslund@embl.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Keywords:	Amino Acid Sequence Homology, Benchmarking, High-Throughput Nucleotide Sequencing, Protein Databases, Proteins, Sequence Homology
Source:	Bioinformatics
ISSN:	1367-4803
Volume:	32
Number:	17
Page Range:	2636-2641
Date:	1 September 2016
Official Publication:	https://doi.org/10.1093/bioinformatics/btw305
PubMed:	View item in PubMed

Repository Staff Only: item control page

Download Statistics

Download Statistics

Downloads

Downloads per month over past year

Open Access

OA at the MDC
OA at Helmholtz
OAI-PMH

MDC Library

Library
Catalogue
Journals
Databases

MDC Repository is powered by EPrints 3 which is developed by the School of Electronics and Computer Science at the University of Southampton.