Density-based hierarchical clustering of pyro-sequences on a large scale--the case of fungal ITS1.
Détails
Télécharger: BIB_AA89A39C702D.P001.pdf (541.58 [Ko])
Etat: Public
Version: Final published version
Etat: Public
Version: Final published version
ID Serval
serval:BIB_AA89A39C702D
Type
Article: article d'un périodique ou d'un magazine.
Collection
Publications
Institution
Titre
Density-based hierarchical clustering of pyro-sequences on a large scale--the case of fungal ITS1.
Périodique
Bioinformatics
ISSN
1367-4811 (Electronic)
ISSN-L
1367-4803
Statut éditorial
Publié
Date de publication
2013
Peer-reviewed
Oui
Volume
29
Numéro
10
Pages
1268-1274
Langue
anglais
Résumé
MOTIVATION: Analysis of millions of pyro-sequences is currently playing a crucial role in the advance of environmental microbiology. Taxonomy-independent, i.e. unsupervised, clustering of these sequences is essential for the definition of Operational Taxonomic Units. For this application, reproducibility and robustness should be the most sought after qualities, but have thus far largely been overlooked.
RESULTS: More than 1 million hyper-variable internal transcribed spacer 1 (ITS1) sequences of fungal origin have been analyzed. The ITS1 sequences were first properly extracted from 454 reads using generalized profiles. Then, otupipe, cd-hit-454, ESPRIT-Tree and DBC454, a new algorithm presented here, were used to analyze the sequences. A numerical assay was developed to measure the reproducibility and robustness of these algorithms. DBC454 was the most robust, closely followed by ESPRIT-Tree. DBC454 features density-based hierarchical clustering, which complements the other methods by providing insights into the structure of the data.
AVAILABILITY: An executable is freely available for non-commercial users at ftp://ftp.vital-it.ch/tools/dbc454. It is designed to run under MPI on a cluster of 64-bit Linux machines running Red Hat 4.x, or on a multi-core OSX system.
CONTACT: dbc454@vital-it.ch or nicolas.guex@isb-sib.ch.
RESULTS: More than 1 million hyper-variable internal transcribed spacer 1 (ITS1) sequences of fungal origin have been analyzed. The ITS1 sequences were first properly extracted from 454 reads using generalized profiles. Then, otupipe, cd-hit-454, ESPRIT-Tree and DBC454, a new algorithm presented here, were used to analyze the sequences. A numerical assay was developed to measure the reproducibility and robustness of these algorithms. DBC454 was the most robust, closely followed by ESPRIT-Tree. DBC454 features density-based hierarchical clustering, which complements the other methods by providing insights into the structure of the data.
AVAILABILITY: An executable is freely available for non-commercial users at ftp://ftp.vital-it.ch/tools/dbc454. It is designed to run under MPI on a cluster of 64-bit Linux machines running Red Hat 4.x, or on a multi-core OSX system.
CONTACT: dbc454@vital-it.ch or nicolas.guex@isb-sib.ch.
Mots-clé
Algorithms , Cluster Analysis , DNA, Fungal , DNA, Ribosomal Space , Fungi , Reproducibility of Results , Soil Microbiology ,
Pubmed
Web of science
Création de la notice
18/03/2013 13:47
Dernière modification de la notice
20/08/2019 15:14