Comparative performance of supertree algorithms in large data sets using the soapberry family (Sapindaceae) as a case study.
Details
Download: REF.pdf (390.50 [Ko])
State: Public
Version: Final published version
License: Not specified
It was possible to publish this article open access thanks to a Swiss National Licence with the publisher.
State: Public
Version: Final published version
License: Not specified
It was possible to publish this article open access thanks to a Swiss National Licence with the publisher.
Serval ID
serval:BIB_AD8F3F9AC659
Type
Article: article from journal or magazin.
Collection
Publications
Institution
Title
Comparative performance of supertree algorithms in large data sets using the soapberry family (Sapindaceae) as a case study.
Journal
Systematic Biology
ISSN
1076-836X (Electronic)
ISSN-L
1063-5157
Publication state
Published
Issued date
2011
Peer-reviewed
Oui
Volume
60
Number
1
Pages
32-44
Language
english
Abstract
For the last 2 decades, supertree reconstruction has been an active field of research and has seen the development of a large number of major algorithms. Because of the growing popularity of the supertree methods, it has become necessary to evaluate the performance of these algorithms to determine which are the best options (especially with regard to the supermatrix approach that is widely used). In this study, seven of the most commonly used supertree methods are investigated by using a large empirical data set (in terms of number of taxa and molecular markers) from the worldwide flowering plant family Sapindaceae. Supertree methods were evaluated using several criteria: similarity of the supertrees with the input trees, similarity between the supertrees and the total evidence tree, level of resolution of the supertree and computational time required by the algorithm. Additional analyses were also conducted on a reduced data set to test if the performance levels were affected by the heuristic searches rather than the algorithms themselves. Based on our results, two main groups of supertree methods were identified: on one hand, the matrix representation with parsimony (MRP), MinFlip, and MinCut methods performed well according to our criteria, whereas the average consensus, split fit, and most similar supertree methods showed a poorer performance or at least did not behave the same way as the total evidence tree. Results for the super distance matrix, that is, the most recent approach tested here, were promising with at least one derived method performing as well as MRP, MinFlip, and MinCut. The output of each method was only slightly improved when applied to the reduced data set, suggesting a correct behavior of the heuristic searches and a relatively low sensitivity of the algorithms to data set sizes and missing data. Results also showed that the MRP analyses could reach a high level of quality even when using a simple heuristic search strategy, with the exception of MRP with Purvis coding scheme and reversible parsimony. The future of supertrees lies in the implementation of a standardized heuristic search for all methods and the increase in computing power to handle large data sets. The latter would prove to be particularly useful for promising approaches such as the maximum quartet fit method that yet requires substantial computing power.
Keywords
Algorithms, Angiosperms/classification, Angiosperms/genetics, Classification/methods, DNA, Plant/genetics, Phylogeny, Sapindaceae/classification, Sapindaceae/genetics
Pubmed
Web of science
Open Access
Yes
Create date
09/06/2010 14:05
Last modification date
14/02/2022 7:56