TreeBeST

"TreeBeST is an original tree builder for constrained neighbour-joining and tree merge, an efficient tool capable of duplication/loss/ortholog inference, and a versatile program facili- tating many tree-building routines, such as tree rooting, alignment filtering and tree plot- ting. TreeBeST stands for ‘(gene) Tree Building guided by Species Tree’. It is previously known as NJTREE as the first piece of codes of this project aimed to build a eighbour-joining tree.

TreeBeST is the core engine of TreeFam (Tree Families Database) project initiated by Richard Durbin. The basic idea of this project is to build a full tree constrained by a manually verified seed tree. The tree builder must know how to utilize the prior knowledge provided by human experts. This demand disqualifies any existing softwares. Given this fact, we devised a new algorithm to control the joining step of traditional neighbour-joining. This is origin the constrained neighbour-joining.

When trees are built, they are only meaningful to biologists. Computers generate trees, but they do not understand them. To understand gene trees, a computer must be equipped with some biological knowledges, the species tree. It will teach a computer how to discriminate a speciation from a duplication event and how to find orthologs, provided a correct gene tree.

Unfortunately, gene trees are not always correct. Since the advent of UPGMA algorithm in 1958, we have tried to find a ideal model for nearly half a century. But we failed. Evolution is so complex a thing. A model best fits in one lineage might mean a disaster in another. A unified model is far from being discovered. TreeBeST aims at improving the accuracy of tree building, but it does not try to set up a new model in a traditional way. Instead, it integrates two existing models with the help of species tree, finding the subtree that best fits the models and merging them together to build a new tree incorporating the advantages of the both. This is the tree algorithm." (treebest.pdf)

Availability & Restrictions

TreeBeST is available to all OSC users without restriction.

The following versions of TreeBeST are available on OSC systems:

Version Glenn Oakley
1.9.2 X  

Usage

Set up

On the Glenn Cluster TreeBeST is accessed by executing the following commands:

module load biosoftw
module load treebest

TreeBeST will be added to the users PATH and can then be run with the command:

treebest <command> [options]

Below lists the commands and their summaries for the treebest program.

Command

nj           build neighbour-joining tree, SDI, rooting
best         build tree with the help of a species tree
phyml        build phyml tree
sdi          speciation vs. duplication inference
spec         print species tree
format       reformat a tree
filter       filter a multi-alignment
trans        translate coding nucleotide alignment
backtrans    translate aa alignment back to nt
leaf         get external nodes
mfa2aln      convert MFA to ALN format
ortho        ortholog/paralog inference
distmat      distance matrix
treedist     topological distance between two trees
pwalign      pairwise alignment
mmerge       merge a forest
export       export a tree to EPS format
subtree       extract the subtree
simulate     simulate a gene tree
sortleaf     sort leaf order
estlen       estimate branch length
trimpoor     trim out leaves that affect the quality of a tree
root         root a tree by minimizing height

Options

treebest nj [options] <input_file>
      -c FILE          constrained tree(s) in NH format [null]
      -m FILE          tree to be compared [null]
      -s FILE          species tree in NH format [default taxa tree]
      -l FILE          ingroup list file [null]
      -t TYPE          codon NT: ntmm, dn, ds, dm; AA: mm, jtt, kimura [mm]
                       ntmm          p-distance (codon alignment)
                       dn            non-synonymous distance
                       ds            synonymous distance
                       dm            dn-ds merge (tree merge)
                       mm            p-distance (amino acid alignment)
                       jtt           JTT model (maximum likelihood)
                       kimura        mm + Kimura's correction
      -T NUM           time limit in seconds [no limit]
      -b NUM           bootstrapping times [100]
      -F NUM           quality cut-off [15]
      -o STR           outgroup for tree cutting [Bilateria]
      -S               treat the first constrained tree as the original tree
      -C               use the leaves of constrained trees as ingroup
      -M               do not apply alignment mask
      -N               do not mask poorly aligned segments
      -g               collapse alternative splicing
      -R               do not apply leaf-reordering
      -p               the root node is a putative node
      -a               branch mode that is used by most tree-builder
      -A               the input alignment is stored in ALN format
      -W               wipe out root (SDI information will be lost!)
      -v               verbose output
      -h               help
treebest best [options] <CDS_alignment>        
            General Options:
            -P               skip PHYML
            -S               ignore the prob. of gene evolution (NOT recommended)
            -A               apply constraint to PHYML
            -C FILE          constraining tree [null]
            -f FILE          species tree [default]
            -r               discard species that do not appear at all
            Output Options:
            -D               output some debug information
            -q               suppress part of PHYML warnings
            -p STR           prefix of intermediate trees [null]
            -o FILE          output tree [null]
            Alignment Preprocessing Options:
            -s               only build tree for genes from sequenced species
            -g               collapse alternative splicing forms
            -N               do not mask low-scoring segments
            -F INT           quality cut-off [11]
            PHYML Related Options:
            -c INT           number of rate categories for PHYML-HKY [2]
            -k FLOAT|e       tv/ts ratio (kappa), 'e' for estimatinig [e]
            -a FLOAT|e       alpha parameter for Gamma distribution [1.0]
            -d FLOAT         duplication probability [0.15]
            -l FLOAT         probability of a loss following a speciation [0.10]
            -L FLOAT         probability of a loss following a duplication [0.20]
            -b FLOAT         prob. of the presence of an inconsistent branch [0.01]
treebest phyml <alignment> [<tree>]
            General Options:
            -t task          build | opt | loglk | dist [build]
            -n               the input is a nucleotide alignment
            -s               print out some statistics
            -N               do not mask low-scoring segments
            -g               collapse alternative splicing
            -b INT           number of bootstraps (slow) [0]
            -o FILE          write output to file [stdout]
            -F INT           quality cut-off [15]
            Model Related Options:
            -m model         nt: JC69 | K2P | F81 | HKY | F84 | TN93 | GTR [HKY]
                             aa: JTT | MtREV | Dayhoff | WAG [WAG]
            -c INT           number of relative substitution rate categories [1]
            -k FLOAT|e       transversion/transition ratio, 'e' for estimating [e]
            -a FLOAT|e       alpha parameter for Gamma distribution [1.0]
            -i FLOAT|e       proportion of invariable sites [0]
            Options for TreeFam Extensions:
            -S               use a species tree to guide tree building
            -f FILE          species tree [TreeFam species tree]
            -d FLOAT         duplication probability  [0.15]
            -l FLOAT         probability of a loss following a speciation [0.10]
            -L FLOAT         probability of a loss following a duplication [0.20]
            -C FILE          constraining tree [NULL]
            -p FLOAT         prob. of the presence of an inconsistent branch [0.01]
treebest sdi [-r|-H|-R|-m <tree0>|-l <spec_list>] <tree>
            Options:
            -r               reroot
            -c               use core species tree instead of the default one
            -H               reroot by minimizing tree height, instead of by minimizing the number of duplication events.
            -R               do not reorder the leaves.
            -s FILE          species tree [default taxa tree]
            -l FILE          cut a subtree that contains genes whose species exist in list [null]
            -m FILE          compare topology with FILE and re-order the leaves [null]
treebest spec
treebest format [-1] <tree>
treebest filter [options] <alignment> 
            Options:
            -n               nucleotide alignment
            -g               collapse alternative splicing
            -M               do not apply alignment mask
            -N               do not mask low-scoring segments
            -F NUM           quality cut-off [15]
treebest trans <nucl_alignment>
treebest backtrans [-t <thres>] <aa_aln> <nt_seq>
treebest leaf <nh_tree>
treebest mfa2aln [-n] <fasta_align>
treebest ortho <tree>
treebest distmat <dn|ds|dm|jtt|kimura|mm|dns> <alignment>
treebest treedist <tree1> <tree2>
treebest pwalign [options] <nt2nt|aa2aa|nt2aa|splice> <seq1> <seq2> 
            Options :
            -f               generate full alignment
            -a               do not apply matrix mean in local alignment
            -d               just calculate alignment boundaries
            -o NUM           gap open penalty
            -e NUM           gap extension penalty
            -n NUM           gap end penalty for nt2nt or aa2aa
            -s NUM           frame-shift penalty for aa2nt
            -g NUM           good splicing penalty
            -w NUM           band-width
            -b NUM           bad splicing penalty
            -m               output miscellaneous information
            -h               help
treebest mmerge [-r <forest>
            Options:
            -r               reroot
treebest export [options] <tree>
            Options:
            -x NUM           width [640]
            -y NUM           height [480]
            -m NUM           margin [20]
            -f NUM           font size [11]
            -b FNUM          box size [4.0]
            -w FNUM          font width [font_size/2]
            -s FILE          species tree
            -B               suppress bootstrap value
            -M               black/white mode
            -S               show species name
            -d               speciation/duplication inference
            -p               pseudo-length
treebest subtree <tree> <list>
treebest simulate [options] 
            Options:
            -d FNUM          duplication probability [0.05]
            -l FNUM          loss probability [0.01]
            -p FNUM          loss probability after duplication [0.25]
            -m FNUM          max height [0.25]
            -n               not show internal name
            -h               help
treebest sortleaf <tree1> [<tree2>]
treebest estlen <tree> <matrix> <tag>
treebest trimpoor <tree> [<threshold>=0>]
treebest root <tree> 

Examples

#PBS -N treebest_test
#PBS -l walltime=00:05:00
#PBS -l nodes=1:ppn=4

cd $PBS_O_WORKDIR
module load biosoftw
module load treebest
cp /usr/local/biosoftw/treebest-1.9.2/examples/ex1.nucl.* .

treebest nj ex1.nucl.mfa > ex1.nucl.1.nhx
cp ex1.nucl.nhx ex1.nucl.1.forest
cat ex1.nucl.1.nhx >> ex1.nucl.1.forest
treebest nj -m ex1.nucl.nhx ex1.nucl.mfa > ex1.nucl.2.nhx
treebest nj -v ex1.nucl.mfa
treebest best ex1.nucl.mfa -o ex1.nucl.3.nhx
treebest best -c 1 -a 0.9 -d 0.14 -l 0.09 -L 0.19 -b 0.009 -o ex1.nucl.4.nhx ex1.nucl.mfa
treebest phyml -o ex1.nucl.1.nh ex1.nucl.mfa
treebest phyml -o ex1.nucl.2.nh ex1.nucl.mfa ex1.nucl.nhx
treebest phyml -s -C ex1.nucl.nhx -o ex1.nucl.4.nh ex1.nucl.mfa
treebest phyml -b 2 -o ex1.nucl.5.nh ex1.nucl.mfa
treebest sdi ex1.nucl.nhx > ex1.nucl.5.nhx
treebest sdi -r ex1.nucl.nhx > ex1.nucl.6.nhx
treebest sdi -r ex1.nucl.nhx > ex1.nucl.7.nhx
treebest spec > all_species.nh
treebest format ex1.nucl.nhx
treebest filter -n -M -N ex1.nucl.mfa > ex1.nucl.1.mfa
treebest trans ex1.nucl.mfa > ex1.aa.mfa
treebest backtrans ex1.aa.mfa ex1.nucl.mfa > ex1.nucl.2.mfa
treebest leaf ex1.nucl.nhx > ex1.nucl.1.leaf
head ex1.nucl.1.leaf | tail -7 > ex1.nucl.1.sublist
treebest mfa2aln -n ex1.nucl.mfa > ex1.nucl.1.aln
treebest ortho ex1.nucl.nhx > ex1.nucl.1.ortho
treebest distmat dn ex1.nucl.mfa > ex1.nucl.1.matrix.dn
treebest distmat ds ex1.nucl.mfa > ex1.nucl.1.matrix.ds
treebest distmat dm ex1.nucl.mfa > ex1.nucl.1.matrix.dm
treebest treedist ex1.nucl.nhx ex1.nucl.1.nhx > ex1.nucl.1.dist
treebest mmerge -r ex1.nucl.1.forest > ex1.nucl.8.nhx
treebest export ex1.nucl.nhx > ex1.nucl.1.eps
treebest subtree ex1.nucl.nhx ex1.nucl.1.sublist > ex1.nucl.9.nhx
treebest simulate > ex1.nucl.6.nh
treebest simulate -d 0.04 -l 0.02 -p 0.5 -m 0.1 > ex1.nucl.7.nh
treebest sortleaf ex1.nucl.nhx > ex1.nucl.sorted.nhx
treebest sortleaf ex1.nucl.nhx ex1.nucl.1.nhx > ex1.nucl.sorted.2.nhx
treebest estlen ex1.nucl.nhx ex1.nucl.1.matrix.ds ds_method > ex1.nucl.1.estlen.ds.nhx
treebest trimpoor ex1.nucl.nhx > ex1.nucl.10.nhx
treebest root ex1.nucl.nhx > ex1.nucl.11.nhx

Further Reading

Supercomputer: 
Service: 
Fields of Science: