Probabilistic base calling of Solexa sequencing data.
Détails
Télécharger: BIB_54200D66E5BC.P001.pdf (503.55 [Ko])
Etat: Public
Version: de l'auteur⸱e
Etat: Public
Version: de l'auteur⸱e
ID Serval
serval:BIB_54200D66E5BC
Type
Article: article d'un périodique ou d'un magazine.
Collection
Publications
Institution
Titre
Probabilistic base calling of Solexa sequencing data.
Périodique
BMC Bioinformatics
ISSN
1471-2105 (Electronic)
ISSN-L
1471-2105
Statut éditorial
Publié
Date de publication
2008
Volume
9
Pages
431-
Langue
anglais
Résumé
BACKGROUND: Solexa/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA colonies. The processing and statistical analysis of such high-throughput data poses new challenges; currently a fair proportion of the tags are routinely discarded due to an inability to match them to a reference sequence, thereby reducing the effective throughput of the technology.
RESULTS: We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads.
CONCLUSION: We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.
RESULTS: We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads.
CONCLUSION: We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.
Mots-clé
Bacteriophage phi X 174/genetics, Base Sequence/genetics, Chromosome Mapping/methods, Cluster Analysis, DNA, Viral/analysis, Expressed Sequence Tags, Pattern Recognition, Automated/methods, Quality Control, Sequence Analysis, DNA/methods, Software, Spectrometry, Fluorescence/methods
Pubmed
Web of science
Open Access
Oui
Création de la notice
18/10/2012 9:10
Dernière modification de la notice
20/08/2019 15:09