# Phylogenetic trees

2 Aug

I just listened to a two hour talk on phylogenetic trees, and they seem fun enough that I thought I’d share them with you!  Sorry I literally forgot to post last week, and then I realized I did not take notes on the stuff I wanted to post about (more pictures by Aaron Fenyes)- here’s a photo that I took and wanted to explain:

Something twisted representation something skein algebra something unit tangent bundle

Instead, you’ll read about the basics behind the research of my friend Gillian (no website), which is being supported somehow by people who want to know about cancer.  First, some background: the field of study of phylogenetic trees is inspired by and informs applications in evolutionary biology.  All of the life-forms today (or at some slice of time) all trace back to a root life-form at time 0.  Each current life-form follows a series of branching away from the root to get to where it is now

Then we can represent this evolution via a labeled rooted binary tree: the root represents the root life-form at time 0, and each branching of the tree represents a different evolution.  The labels mark which life-form is which.  Of course this model isn’t perfect (I can’t find the word for it but it’s a thing where two different species evolve separately from the same ancestor, then meet again and make one species.  If we were to represent this information in a graph, it’d make a cycle and not be a tree), but it’s been fruitful.

The rooted binary tree of the wikipedia picture: node 0 is the root life-form, then 1-7 are the life-forms at our current time.

Now let’s mathify this.  We’d like to encode the evolutionary information into our tree.  We’ve already decided that all life-forms will end at the same time (now), so if we just assign lengths to each of the non-leaf edges this will automatically determine the lengths of the leaf edges.  A leaf in a tree is a vertex with only one neighbor, and we call the edge leading to that vertex a leaf-edge.  Let’s call the non-leaf edges interior edges.  In the picture above, we have 5 non-leaf edges, which determine a tree with 7 leaves.  Using this exact configuration of labels and edges, we have five degrees of freedom: we can make those interior edges whatever lengths we want, as long as they are positive numbers.  So in math-terms, the set of phylogenetic trees (aka rooted, binary, labeled trees) in this configuration forms a positive orthant of $\mathbb{R}^5$.  You can smoothly change any one of the edges to a slightly longer or shorter length, and still have a phylogenetic tree with the same combinatorial data.

This is from the paper I’m writing, but it does show that in 3D, there are 8 orthants cut by the three axes (red dot is the origin).  The pink box represents a single orthant.

What about phylogenetic trees with different combinatorial data?  Say, with different labels or different branching, but the same number of leaves and the same number of interior edges?  First we need to figure out what we mean by ‘different’.  For instance, the following picture from the seminal paper in this field shows three trees that don’t immediately look the same, but we don’t count as different:

Why aren’t they different?  Because they encode the same data for each life-form: reading from node 0 we see that first 1 branches off, then 2, then 3 and 4 in all three cases.  There’s some combinatorics here with partitions that you can do (one can label a tree with a set of partitions).  However, changing the labels so that first 2 branches off, then 1, then 3 and 4 will be a different phylogenetic tree.  In fact I can smoothly go from one to the other in the space that we’re creating: first I shrink the length of the green edge below to zero, which takes us to the middle tree (not binary!), and then extend the blue edge.

Shrink the colored edges to get the same tree in the middle (not a binary tree)

We’re going to add these non-binary trees with one less edge length to our space.  Remember the tree on the left has an entire positive orthant, and the tree on the right has an entire positive orthant.  Shrinking the green length to zero means that we’re moving to one side of the left orthant: so we add this axis to our space (we have $\{x,y\in \mathbb{R}^2: \ x,y\geq 0\}$ instead of strictly greater than 0).  We can glue the green and blue orthants together along this axis.  Here’s a picture from the paper:

Notice that they also have the origin filled in, with a tree with no interior edges.  This is the cone point of this space.  Now we’re finally ready to describe the space of phylogenetic trees: within each combinatorial structure/labeling, we have a Euclidean orthant in (n-2) dimensions.  Then these orthants are glued together along their axes in a specific way, and all of them are connected to the cone point.  This is called BHV(n), short for Billera-Holmes-Vogtmann space (in the paper they call it T(n) but that’s confusing to everyone else).  Here’s the picture of T(4):

Each triangle represents an infinite orthant

There are 15 different orthants glued together in this picture, because the number of labelled rooted binary trees on vertices is (2n-3)!!.  The double !! means you only multiply the odds, a.k.a. (2n-3)(2n-5)(2n-7)… This is also known as Schroeder’s fourth problem , which as far as I can tell was open for 100 years.  Pretty cool!

If you truncate BHV(n) so it’s not infinite (just pick some compact bound), then it forms a nonpositively curved cube complex, and we love those!  CAT(0) cube complexes are great.  I haven’t blogged too much about them (first terrible post and then those truncated Haglund notes) but they are the basis of all that I do and the number one thing I talk about when I give math talks.  Whoops!  The gist is that you glue cubes together in not-terrible ways, and then the resulting complex has great and fun properties (like you can cut it in half the way you want to).

That’s about all I have to say about this!  Gillian is working on some stuff about putting a probability measure on BHV(n) [you can’t do it with certain conditions], embedding it into a small enough Euclidean space that still preserves some of its features, and finding an isometrically embedded copy of the phylogenetic tree inside BHV(n) instead of just the coordinate point.  Also, fun fact to prove to yourself (actually please don’t scoop my friend), find the automorphism group of BHV(n)!  It’s just the symmetric group on some number that has to do with n (n+1 or something like that; I can’t remember and didn’t take notes).

Again, the main reference for this is the seminal paper that should also be accessible as it’s meant for biologists and statisticians.