Wednesday, September 12, 2007

Rodriguez-Ezpeleta et al. 2007

Rodriguez-Ezpeleta, N., H. Brinkmann, B. Roure, N. Lartillot, B.F. Lang, and H. Philippe. 2007. Detecting and overcoming systematic errors in genome-scale phylogenies. Syst. Biol. 56(3): 389-399.

They confirmed that higher statistical support does not necessarily lead to more accurate results because: increase in data sets = increase in systematic errors = potential for strongly supported BUT incorrect phylogeny. Hedtke et al. showed this as well but they don't reference this paper.

They list known causes of model violation:
1) across-site rate variation (I'm guessing this is the same as among-site rate variation?)
2) heterotachy (they define as the across-site rate variation through time)
-I haven't really read about this. I guess at any given time throughout history, there could be a different asrv.
3) site-interdependent evolution
-I need to read the references on this
4) compositional heterogeneity
-I think this means the proportion of A's, G's, C's and T's?
5) site-heterogeneous nucleotide/amino acid replacement
-not sure what this means, have to read the references as well.

nonphylogenetic signal = the 'apparent' signal arising from model violations

The impact of model violations on phylogenetic accuracy is greatly exaggerated when multiple substitutions occur at given sites (mutational saturation).

"Long branch attraction is a well-known case of systematic error that causes the clustering of fast-evolving species regardless of their true phylogenetic relationships." Fast-evolving can either mean the time it took to evolve or the amount of evolutionary change.

They list five ways to overcome LBA:

1) increase taxon sampling
2) improve models of sequence evolution, allowing a more efficient detection of multiple substitutions
3) remove fast-evolving species from the analyses
4) remove fast-evolving genes
5) remove fast-evolving sequence positions

I should check out the programs PhyloBayes and the MUST package.
(the must package calculates the slope of saturation curves! neat)

What are RELL bootstraps?

Monday, September 10, 2007

Marjanovic & Laurin 2007

Marjanovic, D. and M. Laurin. 2007. Fossils, molecules, divergence times, and the origin of Lissamphibians. Syst. Biol. 56(3): 369-388.

"..a literal interpretation of the fossil record always underestimates the date of appearance of taxa because it can only give a latest possible date of appearance, not an earliest possible date of appearance..."

At first this quote didn't make sense to me, but now I think it means that when you find a fossil and put a date on the fossil, this means that the species had to have existed during this time. Therefore the speciation event could not have happened after this date, but it could have happened before this date. Fossil records cannot give an earliest possible date of appearance of a species because new fossils can always be found that can contradict the earliest possible date.

Some divergence dating methods that the authors used:
1) Multidivtime (Thorne and Kishino 2002)
2) QDate 1.11 (Rambaut and Bromham 1998)
3) r8s 1.71 (Sanderson 2003, 2006) using the penalized likelihood method
4) PATHd8 (Anderson 2006)

They couldn't get Mdt to work, I heard QDate is a bad method, r8s is okay, and I've never heard of Pathd8 and they also said the results didn't make sense.

-neighbour-joining trees are phenograms, not cladograms

On page 383, they discuss something odd. They say that according to Kolaczkowski and Thornton 2004, parsimony does better than ML and bayesian methods because parsimony does not need an assumption on how many rate categories there are. For instance, in many real cases each nucleotide position evolves at its own speed, causing potential problems for approaches that include evoltuion models. I don't think I've heard of this argument before for parsimony and I will have to read the Kolaczkowski paper to get a better handle on this. They also state that the branch lengths of the parsimony tree fit the morphological data better than the likelihood tree. These seem like odd statements to justify using parsimony instead of ML or bayesian for molecular data. I'm surprised that the reviewers didn't catch this.