The fixation of mutations in genes is due to a balance of selection, mutation and drift. Codon models have proven very useful in distinguishing selection, including positive selection, from drift. Synonymous substitution rates are assumed to capture all variation that is not under selection, and thus the ratio of non synonymous (dN) to synonymous (dS) substitutions should indicate selection. There are many models allowing selection (and thus dN) to vary across the gene, but dS is assumed to be constant over all positions of one gene. Yet significant variation of dS has been observed inside genes.
We have developed a simple new model which takes into account variations in mutation rate in addition to selection levels. We combine this both with site variation in selection (M8 of PAML) and with branch-site variation in selection. Thus we present the first integration of dS variation with episodic selection, thanks to the fact that our model is more computationally efficient than previous efforts to capture dS variation. We use our improved positive selection models to scan genome-scale data. We show that the new model provides a better fit on the real data. We present different effects of the gene sequence constraints on the performance of the codon models in mammals vs bacteria.