Towards building the tree of life: a simulation study for all angiosperm genera.

Details

Serval ID
serval:BIB_415B7A05529C
Type
Article: article from journal or magazin.
Collection
Publications
Title
Towards building the tree of life: a simulation study for all angiosperm genera.
Journal
Systematic Biology
Author(s)
Salamin N., Hodkinson T.R., Savolainen Coates V.
ISSN
1063-5157 (Print)
ISSN-L
1063-5157
Publication state
Published
Issued date
2005
Peer-reviewed
Oui
Volume
54
Number
2
Pages
183-196
Language
english
Abstract
Comprehensive phylogenetic trees are essential tools to better understand evolutionary processes. For many groups of organisms or projects aiming to build the Tree of Life, comprehensive phylogenetic analysis implies sampling hundreds to thousands of taxa. For the tree of all life this task rises to a highly conservative 13 million. Here, we assessed the performances of methods to reconstruct large trees using Monte Carlo simulations with parameters inferred from four large angiosperm DNA matrices, containing between 141 and 567 taxa. For each data set, parameters of the HKY85+G model were estimated and used to simulate 20 new matrices for sequence lengths from 100 to 10,000 base pairs. Maximum parsimony and neighbor joining were used to analyze each simulated matrix. In our simulations, accuracy was measured by counting the number of nodes in the model tree that were correctly inferred. The accuracy of the two methods increased very quickly with the addition of characters before reaching a plateau around 1000 nucleotides for any sizes of trees simulated. An increase in the number of taxa from 141 to 567 did not significantly decrease the accuracy of the methods used, despite the increase in the complexity of tree space. Moreover, the distribution of branch lengths rather than the rate of evolution was found to be the most important factor for accurately inferring these large trees. Finally, a tree containing 13,000 taxa was created to represent a hypothetical tree of all angiosperm genera and the efficiency of phylogenetic reconstructions was tested with simulated matrices containing an increasing number of nucleotides up to a maximum of 30,000. Even with such a large tree, our simulations suggested that simple heuristic searches were able to infer up to 80% of the nodes correctly.
Keywords
Angiosperms/genetics, Base Sequence, Classification/methods, Cluster Analysis, Computer Simulation, Models, Genetic, Monte Carlo Method, Phylogeny, Reproducibility of Results, Sample Size
Pubmed
Web of science
Open Access
Yes
Create date
24/01/2008 19:41
Last modification date
20/08/2019 14:41
Usage data