-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Welcome to the Clustering_Phylogenetic_trees wiki!
Background: Gene trees carry important information about specic evolutionary patterns which characterize the evolution of the corresponding gene families. The implementation of Tree of Life project reconstructs the species trees by comparing and merging all gene trees. However, this information can be lost when merging multiple gene trees into a single species tree. The single species tree can be inferred by the trees when the taxa are identical (i.e. consensus tree problem) or when the taxa are different (i.e. supertree problem). In this study, we are interested exclusively in the case of consensus tree problem.
Results: In this paper, we dene a new fast heuristic method for building multiple consensus trees using k-medoids. We introduce two specic version of Silhouette and Calinski-Harabasz indexes adapted for tree clustering. The objective function relocates each element on their best clusters in k clusters and the criterion chooses the optimal number of clusters K in the dataset.
Conclusions: Clustering Phylogeny Trees using k-medoids should provide a useful framework for classifying phylogeny trees. We validated the method with simulated data (i.e. random trees) and with the real dataset (i.e. 47 gene trees of 14 organisms of Archaea). The results demonstrate the efficacy of our algorithm in terms of both clustering quality and the running time makes it very adapted to large genomic and phylogeny datasets. The program, Clustering Phylogeny Trees version consensus tree, written in C/C++ was made freely available for the research community (it can be downloaded from https://github.com/TahiriNadia/Clustering Phylogenetic trees).
Keywords: Cluster validity index; consensus tree; k-medoids clustering; phylogenetic tree; Robinson and Foulds topological distance