Wednesday, September 12, 2007

Rodriguez-Ezpeleta et al. 2007

Rodriguez-Ezpeleta, N., H. Brinkmann, B. Roure, N. Lartillot, B.F. Lang, and H. Philippe. 2007. Detecting and overcoming systematic errors in genome-scale phylogenies. Syst. Biol. 56(3): 389-399.

They confirmed that higher statistical support does not necessarily lead to more accurate results because: increase in data sets = increase in systematic errors = potential for strongly supported BUT incorrect phylogeny. Hedtke et al. showed this as well but they don't reference this paper.

They list known causes of model violation:
1) across-site rate variation (I'm guessing this is the same as among-site rate variation?)
2) heterotachy (they define as the across-site rate variation through time)
-I haven't really read about this. I guess at any given time throughout history, there could be a different asrv.
3) site-interdependent evolution
-I need to read the references on this
4) compositional heterogeneity
-I think this means the proportion of A's, G's, C's and T's?
5) site-heterogeneous nucleotide/amino acid replacement
-not sure what this means, have to read the references as well.

nonphylogenetic signal = the 'apparent' signal arising from model violations

The impact of model violations on phylogenetic accuracy is greatly exaggerated when multiple substitutions occur at given sites (mutational saturation).

"Long branch attraction is a well-known case of systematic error that causes the clustering of fast-evolving species regardless of their true phylogenetic relationships." Fast-evolving can either mean the time it took to evolve or the amount of evolutionary change.

They list five ways to overcome LBA:

1) increase taxon sampling
2) improve models of sequence evolution, allowing a more efficient detection of multiple substitutions
3) remove fast-evolving species from the analyses
4) remove fast-evolving genes
5) remove fast-evolving sequence positions

I should check out the programs PhyloBayes and the MUST package.
(the must package calculates the slope of saturation curves! neat)

What are RELL bootstraps?

1 comment:

David Marjanović said...

4) compositional heterogeneity
-I think this means the proportion of A's, G's, C's and T's?


Yes. There are AT-rich organisms and GC-rich organisms; they can have drastically different probabilities for a substitution and its reversal.

5) site-heterogeneous nucleotide/amino acid replacement
-not sure what this means, have to read the references as well.


Sounds like some sites have a high probability of replacing a certain amino acid with a certain other -- presumably a chemically similar one.