Comparative performance of supertree algorithms in large data sets using the soapberry family (Sapindaceae) as a case study.

Buerki, S.; Forest, F.; Salamin, N.; Alvarez, N.

doi:10.1093/sysbio/syq057

Comparative performance of supertree algorithms in large data sets using the soapberry family (Sapindaceae) as a case study.

Détails

Télécharger: REF.pdf (390.50 [Ko])
Etat: Public
Version: Final published version
Licence: Non spécifiée
It was possible to publish this article open access thanks to a Swiss National Licence with the publisher.

ID Serval

serval:BIB_AD8F3F9AC659

Type

Article: article d'un périodique ou d'un magazine.

Collection

Publications

Institution

UNIL/CHUV

Titre

Comparative performance of supertree algorithms in large data sets using the soapberry family (Sapindaceae) as a case study.

Périodique

Systematic Biology

Auteur⸱e⸱s

Buerki S., Forest F., Salamin N., Alvarez N.

ISSN

1076-836X (Electronic)

ISSN-L

1063-5157

Statut éditorial

Publié

Date de publication

2011

Peer-reviewed

Oui

Volume

Numéro

Pages

32-44

Langue

anglais

Résumé

For the last 2 decades, supertree reconstruction has been an active field of research and has seen the development of a large number of major algorithms. Because of the growing popularity of the supertree methods, it has become necessary to evaluate the performance of these algorithms to determine which are the best options (especially with regard to the supermatrix approach that is widely used). In this study, seven of the most commonly used supertree methods are investigated by using a large empirical data set (in terms of number of taxa and molecular markers) from the worldwide flowering plant family Sapindaceae. Supertree methods were evaluated using several criteria: similarity of the supertrees with the input trees, similarity between the supertrees and the total evidence tree, level of resolution of the supertree and computational time required by the algorithm. Additional analyses were also conducted on a reduced data set to test if the performance levels were affected by the heuristic searches rather than the algorithms themselves. Based on our results, two main groups of supertree methods were identified: on one hand, the matrix representation with parsimony (MRP), MinFlip, and MinCut methods performed well according to our criteria, whereas the average consensus, split fit, and most similar supertree methods showed a poorer performance or at least did not behave the same way as the total evidence tree. Results for the super distance matrix, that is, the most recent approach tested here, were promising with at least one derived method performing as well as MRP, MinFlip, and MinCut. The output of each method was only slightly improved when applied to the reduced data set, suggesting a correct behavior of the heuristic searches and a relatively low sensitivity of the algorithms to data set sizes and missing data. Results also showed that the MRP analyses could reach a high level of quality even when using a simple heuristic search strategy, with the exception of MRP with Purvis coding scheme and reversible parsimony. The future of supertrees lies in the implementation of a standardized heuristic search for all methods and the increase in computing power to handle large data sets. The latter would prove to be particularly useful for promising approaches such as the maximum quartet fit method that yet requires substantial computing power.

Mots-clé

Algorithms, Angiosperms/classification, Angiosperms/genetics, Classification/methods, DNA, Plant/genetics, Phylogeny, Sapindaceae/classification, Sapindaceae/genetics

URN

urn:nbn:ch:serval-BIB_AD8F3F9AC6597

OAI-PMH

oai:serval.unil.ch:BIB_AD8F3F9AC659

DOI

10.1093/sysbio/syq057

Pubmed

21068445

Web of science

000285197100004

Open Access

Oui