Supercomputing Networking Research Education Ohio Supercomputer Center Site Map Staff Directory Support
Supercomputing image

PAML

Introduction

"PAML (for Phylogentic Analysis by Maximum Likelihood) contains a few programs for model fitting and phylogenetic tree reconstruction using nucleotide or amino-acid sequence data." (doc/pamlDOC.pdf)

Version

Version 4.4d is currently available at OSC.

Availability

PAML is available on the Glenn Cluster.

Usage

On the Glenn Cluster paml is accessed by executing the following commands:
module load biosoftw
module load paml
PAML is a collection of several programs that will be added to the users PATH: baseml, basemlg, chi2, codeml, ds, evolver, mcmctree, pamp, and yn00.  Each of the programs has separate, but typically similar usage and options.

Options

baseml / basemlg   Maximum likelihood analysis of nucleotide sequences using a faster discrete model / Implements the (continuous) gamma model of Yang (Intensive Computation)
Both baseml and basemlg require a baseml.ctl in the current directory with the following variables set: seqfile, outfile, treefile

The following are optional variable to set in baseml.ctl: noisy, verbose, runmode, model, Mgene, ndata, clock, fix_kappa, kappa, fix_alpha, alpha, Malpha, ncatG, fix_rho,nparK, nhomo, getSE, RateAncestor, Small_Diff, cleandata, icode, fix_blength, method

chi2   Calculates the x2 critical value and p value for conducting the likelihood ratio test
chi2 [p | INTEGER DOUBLE]
chi2                   prints x2 critical values at set significance levels until ‘q+ENTER’ is reached
chi2 p                 interactive set the degrees of freedom and x2 value
chi2 INTEGER DOUBLE    Computes the probability for INTEGER df and DOUBLE x2

codeml   Implements the codon substitution model of Goldman & Yang for DNA and amino acid sequences
codeml requires codeml.ctl to be located in the current directory with the following variables set: seqfile, outfile, treefile, aaRatefile
The following are optional variables to set in codeml.ctl: noisy, verbose, runmode, seqtype, CodonFreq, ndata, aaDist, model, NSsites, icode, Mgene, fix_kappa, kappa, fix_omega, omega, fix_alpha, alpha, Malpha, ncatG, getSE, RateAncestor, Small_Diff, cleandata, fix_blength, method

ds   Computes descriptive statistics from a baseml/basemlg analysis
ds filename.type

evolver   Simulates sequences under nucleotide, codon, and amino acid substitution models; generates random trees; and calculates the partition distances between trees
   EVOLVER in paml version 4.4d, March 2011
   Results for options 1-4 & 8 go into evolver.out
   Options
      (1) Get random UNROOTED trees?
      (2) Get random ROOTED trees?
      (3) List all UNROOTED trees?
      (4) List all ROOTED trees?
      (5) Simulate nucleotide data sets (use MCbase.dat)?
      (6) Simulate codon data sets      (use MCcodon.dat)?
      (7) Simulate amino acid data sets (use MCaa.dat)?
      (8) Calculate identical bi-partitions between trees?
      (9) Calculate clade support values (read 2 treefiles)?
      (11) Label clades?
      (0) Quit?

evolver’s option 5 requires MCbase.dat.  evolver’s option 6 requires MCcodon.dat.  evolver’s option 7 requires MCaa.dat and dat/mtmam.dat.  evolver’s option 9 requires truetree rst1 (formed from stewart.trees & codeml's output rst1).  evolver’s option 11 requires name.tress with user input.

mcmctree   Implements the Bayesian MCMC algorithm of Yang and Rannala for estimating species divergence times
mcmctree requires mcmctree.ctl to be located in the current directory with the following variables set: seqfile, treefile, outfile, RootAge, usedata
The following are optional variables to set in mcmctree.ctl: seed, ndata, clock, model, alpha, ncatG, cleandata, BDparas, kappa_gamma, alpha_gamma, rgene_gamma, sigma2_gamma, finetune, print, burnin, sampfreq, nsample

pamp   Implements the parsimony-based analysis of Yang and Kumar
pamp requires pamp.ctl to be located in the current directory with the following variables set: seqfile, treefile, outfile
The following are optional variables to set in pamp.ctl: seqtype, ncatG, nhomo

yn00   Implements the method of Yang and Nielson for estimating synonymous and nonsynonymous substitution rates in pairwise comparisons of protein-coding DNA sequences
yn00 requires yn00.ctl to be located in the current directory with the following variables set: seqfile, outfile
The following are optional variables to set in yn00.ctl: verbose, icode, weighting, commonf3x4, ndata

Control Files

All .ctl files (baseml.ctl, codeml.ctl, mcmctree.ctl, pamp.ctl, and yn00.ctl) have comment line starting with '*'.

Example

#PBS -N paml_test
#PBS -l walltime=0:05:00
#PBS -l nodes=1:ppn=4

cd $PBS_O_WORKDIR
module load biosoftw
module load paml
export PAML_DIR=/usr/local/biosoftw/paml44
cp $PAML_DIR/*.* .
cp -r $PAML_DIR/dat .
cp -r $PAML_DIR/examples .
baseml
chi2 1 3.84
codeml
ds in.baseml
echo -e "1\n5\n5 5\n0\n2\n5\n5 5\n0\n3\n5\n4\n5\n5\n6\n7\n8\n" | evolver"
mcmctree
pamp
yn00

Documentation

Four pdf documents are located in the following folder on Glenn:  /usr/local/biosoftw/paml44/doc/
An online discussion group for users is paml is located at the following website: http://www.rannala.org/phpBB2/