Update Tue Feb 10th 11:30am -- This issue is resolved.  

There is a bug in the changes we made to a part of our batch software during the downtime. The bug is affecting some users when they submit jobs to our system.

TreeBeST

"TreeBeST is an original tree builder for constrained neighbour-joining and tree merge, an efficient tool capable of duplication/loss/ortholog inference, and a versatile program facili- tating many tree-building routines, such as tree rooting, alignment filtering and tree plot- ting. TreeBeST stands for ‘(gene) Tree Building guided by Species Tree’. It is previously known as NJTREE as the first piece of codes of this project aimed to build a eighbour-joining tree.

TreeBeST is the core engine of TreeFam (Tree Families Database) project initiated by Richard Durbin. The basic idea of this project is to build a full tree constrained by a manually verified seed tree. The tree builder must know how to utilize the prior knowledge provided by human experts. This demand disqualifies any existing softwares. Given this fact, we devised a new algorithm to control the joining step of traditional neighbour-joining. This is origin the constrained neighbour-joining.

When trees are built, they are only meaningful to biologists. Computers generate trees, but they do not understand them. To understand gene trees, a computer must be equipped with some biological knowledges, the species tree. It will teach a computer how to discriminate a speciation from a duplication event and how to find orthologs, provided a correct gene tree.

Unfortunately, gene trees are not always correct. Since the advent of UPGMA algorithm in 1958, we have tried to find a ideal model for nearly half a century. But we failed. Evolution is so complex a thing. A model best fits in one lineage might mean a disaster in another. A unified model is far from being discovered. TreeBeST aims at improving the accuracy of tree building, but it does not try to set up a new model in a traditional way. Instead, it integrates two existing models with the help of species tree, finding the subtree that best fits the models and merging them together to build a new tree incorporating the advantages of the both. This is the tree algorithm." (treebest.pdf)

Availability & Restrictions

TreeBeST is available to all OSC users without restriction.

The following versions of TreeBeST are available on OSC systems:

Version Glenn Oakley
1.9.2 X  

Usage

Set up

On the Glenn Cluster TreeBeST is accessed by executing the following commands:

module load biosoftw
module load treebest

TreeBeST will be added to the users PATH and can then be run with the command:

treebest <command> [options]

Below lists the commands and their summaries for the treebest program.

Command

nj           build neighbour-joining tree, SDI, rooting
best         build tree with the help of a species tree
phyml        build phyml tree
sdi          speciation vs. duplication inference
spec         print species tree
format       reformat a tree
filter       filter a multi-alignment
trans        translate coding nucleotide alignment
backtrans    translate aa alignment back to nt
leaf         get external nodes
mfa2aln      convert MFA to ALN format
ortho        ortholog/paralog inference
distmat      distance matrix
treedist     topological distance between two trees
pwalign      pairwise alignment
mmerge       merge a forest
export       export a tree to EPS format
subtree       extract the subtree
simulate     simulate a gene tree
sortleaf     sort leaf order
estlen       estimate branch length
trimpoor     trim out leaves that affect the quality of a tree
root         root a tree by minimizing height

Options

treebest nj [options] <input_file>
      -c FILE          constrained tree(s) in NH format [null]
      -m FILE          tree to be compared [null]
      -s FILE          species tree in NH format [default taxa tree]
      -l FILE          ingroup list file [null]
      -t TYPE          codon NT: ntmm, dn, ds, dm; AA: mm, jtt, kimura [mm]
                       ntmm          p-distance (codon alignment)
                       dn            non-synonymous distance
                       ds            synonymous distance
                       dm            dn-ds merge (tree merge)
                       mm            p-distance (amino acid alignment)
                       jtt           JTT model (maximum likelihood)
                       kimura        mm + Kimura's correction
      -T NUM           time limit in seconds [no limit]
      -b NUM           bootstrapping times [100]
      -F NUM           quality cut-off [15]
      -o STR           outgroup for tree cutting [Bilateria]
      -S               treat the first constrained tree as the original tree
      -C               use the leaves of constrained trees as ingroup
      -M               do not apply alignment mask
      -N               do not mask poorly aligned segments
      -g               collapse alternative splicing
      -R               do not apply leaf-reordering
      -p               the root node is a putative node
      -a               branch mode that is used by most tree-builder
      -A               the input alignment is stored in ALN format
      -W               wipe out root (SDI information will be lost!)
      -v               verbose output
      -h               help
treebest best [options] <CDS_alignment>        
            General Options:
            -P               skip PHYML
            -S               ignore the prob. of gene evolution (NOT recommended)
            -A               apply constraint to PHYML
            -C FILE          constraining tree [null]
            -f FILE          species tree [default]
            -r               discard species that do not appear at all
            Output Options:
            -D               output some debug information
            -q               suppress part of PHYML warnings
            -p STR           prefix of intermediate trees [null]
            -o FILE          output tree [null]
            Alignment Preprocessing Options:
            -s               only build tree for genes from sequenced species
            -g               collapse alternative splicing forms
            -N               do not mask low-scoring segments
            -F INT           quality cut-off [11]
            PHYML Related Options:
            -c INT           number of rate categories for PHYML-HKY [2]
            -k FLOAT|e       tv/ts ratio (kappa), 'e' for estimatinig [e]
            -a FLOAT|e       alpha parameter for Gamma distribution [1.0]
            -d FLOAT         duplication probability [0.15]
            -l FLOAT         probability of a loss following a speciation [0.10]
            -L FLOAT         probability of a loss following a duplication [0.20]
            -b FLOAT         prob. of the presence of an inconsistent branch [0.01]
treebest phyml <alignment> [<tree>]
            General Options:
            -t task          build | opt | loglk | dist [build]
            -n               the input is a nucleotide alignment
            -s               print out some statistics
            -N               do not mask low-scoring segments
            -g               collapse alternative splicing
            -b INT           number of bootstraps (slow) [0]
            -o FILE          write output to file [stdout]
            -F INT           quality cut-off [15]
            Model Related Options:
            -m model         nt: JC69 | K2P | F81 | HKY | F84 | TN93 | GTR [HKY]
                             aa: JTT | MtREV | Dayhoff | WAG [WAG]
            -c INT           number of relative substitution rate categories [1]
            -k FLOAT|e       transversion/transition ratio, 'e' for estimating [e]
            -a FLOAT|e       alpha parameter for Gamma distribution [1.0]
            -i FLOAT|e       proportion of invariable sites [0]
            Options for TreeFam Extensions:
            -S               use a species tree to guide tree building
            -f FILE          species tree [TreeFam species tree]
            -d FLOAT         duplication probability  [0.15]
            -l FLOAT         probability of a loss following a speciation [0.10]
            -L FLOAT         probability of a loss following a duplication [0.20]
            -C FILE          constraining tree [NULL]
            -p FLOAT         prob. of the presence of an inconsistent branch [0.01]
treebest sdi [-r|-H|-R|-m <tree0>|-l <spec_list>] <tree>
            Options:
            -r               reroot
            -c               use core species tree instead of the default one
            -H               reroot by minimizing tree height, instead of by minimizing the number of duplication events.
            -R               do not reorder the leaves.
            -s FILE          species tree [default taxa tree]
            -l FILE          cut a subtree that contains genes whose species exist in list [null]
            -m FILE          compare topology with FILE and re-order the leaves [null]
treebest spec
treebest format [-1] <tree>
treebest filter [options] <alignment> 
            Options:
            -n               nucleotide alignment
            -g               collapse alternative splicing
            -M               do not apply alignment mask
            -N               do not mask low-scoring segments
            -F NUM           quality cut-off [15]
treebest trans <nucl_alignment>
treebest backtrans [-t <thres>] <aa_aln> <nt_seq>
treebest leaf <nh_tree>
treebest mfa2aln [-n] <fasta_align>
treebest ortho <tree>
treebest distmat <dn|ds|dm|jtt|kimura|mm|dns> <alignment>
treebest treedist <tree1> <tree2>
treebest pwalign [options] <nt2nt|aa2aa|nt2aa|splice> <seq1> <seq2> 
            Options :
            -f               generate full alignment
            -a               do not apply matrix mean in local alignment
            -d               just calculate alignment boundaries
            -o NUM           gap open penalty
            -e NUM           gap extension penalty
            -n NUM           gap end penalty for nt2nt or aa2aa
            -s NUM           frame-shift penalty for aa2nt
            -g NUM           good splicing penalty
            -w NUM           band-width
            -b NUM           bad splicing penalty
            -m               output miscellaneous information
            -h               help
treebest mmerge [-r <forest>
            Options:
            -r               reroot
treebest export [options] <tree>
            Options:
            -x NUM           width [640]
            -y NUM           height [480]
            -m NUM           margin [20]
            -f NUM           font size [11]
            -b FNUM          box size [4.0]
            -w FNUM          font width [font_size/2]
            -s FILE          species tree
            -B               suppress bootstrap value
            -M               black/white mode
            -S               show species name
            -d               speciation/duplication inference
            -p               pseudo-length
treebest subtree <tree> <list>
treebest simulate [options] 
            Options:
            -d FNUM          duplication probability [0.05]
            -l FNUM          loss probability [0.01]
            -p FNUM          loss probability after duplication [0.25]
            -m FNUM          max height [0.25]
            -n               not show internal name
            -h               help
treebest sortleaf <tree1> [<tree2>]
treebest estlen <tree> <matrix> <tag>
treebest trimpoor <tree> [<threshold>=0>]
treebest root <tree> 

Examples

#PBS -N treebest_test
#PBS -l walltime=00:05:00
#PBS -l nodes=1:ppn=4

cd $PBS_O_WORKDIR
module load biosoftw
module load treebest
cp /usr/local/biosoftw/treebest-1.9.2/examples/ex1.nucl.* .

treebest nj ex1.nucl.mfa > ex1.nucl.1.nhx
cp ex1.nucl.nhx ex1.nucl.1.forest
cat ex1.nucl.1.nhx >> ex1.nucl.1.forest
treebest nj -m ex1.nucl.nhx ex1.nucl.mfa > ex1.nucl.2.nhx
treebest nj -v ex1.nucl.mfa
treebest best ex1.nucl.mfa -o ex1.nucl.3.nhx
treebest best -c 1 -a 0.9 -d 0.14 -l 0.09 -L 0.19 -b 0.009 -o ex1.nucl.4.nhx ex1.nucl.mfa
treebest phyml -o ex1.nucl.1.nh ex1.nucl.mfa
treebest phyml -o ex1.nucl.2.nh ex1.nucl.mfa ex1.nucl.nhx
treebest phyml -s -C ex1.nucl.nhx -o ex1.nucl.4.nh ex1.nucl.mfa
treebest phyml -b 2 -o ex1.nucl.5.nh ex1.nucl.mfa
treebest sdi ex1.nucl.nhx > ex1.nucl.5.nhx
treebest sdi -r ex1.nucl.nhx > ex1.nucl.6.nhx
treebest sdi -r ex1.nucl.nhx > ex1.nucl.7.nhx
treebest spec > all_species.nh
treebest format ex1.nucl.nhx
treebest filter -n -M -N ex1.nucl.mfa > ex1.nucl.1.mfa
treebest trans ex1.nucl.mfa > ex1.aa.mfa
treebest backtrans ex1.aa.mfa ex1.nucl.mfa > ex1.nucl.2.mfa
treebest leaf ex1.nucl.nhx > ex1.nucl.1.leaf
head ex1.nucl.1.leaf | tail -7 > ex1.nucl.1.sublist
treebest mfa2aln -n ex1.nucl.mfa > ex1.nucl.1.aln
treebest ortho ex1.nucl.nhx > ex1.nucl.1.ortho
treebest distmat dn ex1.nucl.mfa > ex1.nucl.1.matrix.dn
treebest distmat ds ex1.nucl.mfa > ex1.nucl.1.matrix.ds
treebest distmat dm ex1.nucl.mfa > ex1.nucl.1.matrix.dm
treebest treedist ex1.nucl.nhx ex1.nucl.1.nhx > ex1.nucl.1.dist
treebest mmerge -r ex1.nucl.1.forest > ex1.nucl.8.nhx
treebest export ex1.nucl.nhx > ex1.nucl.1.eps
treebest subtree ex1.nucl.nhx ex1.nucl.1.sublist > ex1.nucl.9.nhx
treebest simulate > ex1.nucl.6.nh
treebest simulate -d 0.04 -l 0.02 -p 0.5 -m 0.1 > ex1.nucl.7.nh
treebest sortleaf ex1.nucl.nhx > ex1.nucl.sorted.nhx
treebest sortleaf ex1.nucl.nhx ex1.nucl.1.nhx > ex1.nucl.sorted.2.nhx
treebest estlen ex1.nucl.nhx ex1.nucl.1.matrix.ds ds_method > ex1.nucl.1.estlen.ds.nhx
treebest trimpoor ex1.nucl.nhx > ex1.nucl.10.nhx
treebest root ex1.nucl.nhx > ex1.nucl.11.nhx

Further Reading

Supercomputer: 
Service: 
Fields of Science: