Phylogenetic Trees

Next: A Simple Solution? Up: Preface: Phylogenetics and Phylogenetic Previous: What is Phylogenetics?

Phylogenetic Trees

The most convenient way of presenting phylogenetic information is using a phylogenetic tree. In a phylogenetic tree, every leaf represents a species. Nodes are labeled, either with species names or the values (also referred to as states) of their characters, and the edges represent the genetic connections. It is important to note that there is usually a big difference between the leaf nodes, that represent real species, and the internal nodes, that in most cases represent the hypothetical evolutionary ancestors of the species in the data. Phylogenetic trees take several forms: They can be rooted or unrooted, binary or general, and may show, or not show, edge lengths. A rooted tree is a tree in which one of the nodes is stipulated to be the root, and thus the direction of ancestral relationships is determined. An unrooted tree, as could be imagined, has no pre-determined root and therefore induces no hierarchy. Rooting an unrooted tree involves inserting a new node, which will function as the root node, between two existing nodes. Figures 9.1 and 9.2 show a rooted tree and its unrooted counterpart, respectively. A binary, or bifurcating, tree is of course a tree in which a node may have only 0 to 2 subnodes, that is, in an unrooted tree, up to three neighbors. It is sometimes useful to allow more than 2 subnodes (multifurcation), but the discussion in this lecture will be limited to binary trees. A tree can show edge lengths, indicating the genetic distance between the connected nodes. We sometimes assume the existence of a molecular clock, a constant pace of the evolutionary processes. If this is the case, we could theoretically produce a phylogenetic distance-preserving tree which can be presented along a time-axis - assigning to each node the time in which it ``occurred'' in the history of evolution. In such a ``perfect'' tree, the length of each edge would be the difference in time between the parent node and the child node. The problem we shall discuss in this lecture is this:

Problem 9.1 Optimal Phylogenetic Tree.
INPUT:

A set of n species,
A set of m characters pertaining to all of these species,
For each species, the values of each of the characters,

QUESTION: What is the fully-labeled phylogenetic tree that best explains the data, i.e., maximizes some target function.

The process of solving this problem is called inferring the phylogeny. The input is usually given as an $n \times m$ matrix M, where M_i,j represents the value of the jth character of the ith species. The state (value), of each character is taken from a known set A_j. The input may also include other relevant parameters - e.g., the distribution of changes (mutations) in each character, weights representing relative importance of characters, etc. The goal will be to maximize some score over the possible phylogenetic trees and produce the best one. We will make the following assumptions in attempting to infer phylogenies:

Characters are mutually independent - that is, change in one character has no effect on the distribution of another character.
After two species diverge in the tree, they continue to evolve independently.

None of these assumptions is necessarily (or even probably) correct, but they make our life much easier, simplifying the discussion considerably.

A Simple Solution?

Next: A Simple Solution? Up: Preface: Phylogenetics and Phylogenetic Previous: What is Phylogenetics?

Itshack Pe`er
1999-02-18