treebest
Introduction
"TreeBeST is an original tree builder for constrained neighbour-joining and tree merge, an efficient tool capable of duplication/loss/ortholog inference, and a versatile program facili- tating many tree-building routines, such as tree rooting, alignment filtering and tree plot- ting. TreeBeST stands for ‘(gene) Tree Building guided by Species Tree’. It is previously known as NJTREE as the first piece of codes of this project aimed to build a eighbour-joining tree.
TreeBeST is the core engine of TreeFam (Tree Families Database) project initiated by Richard Durbin. The basic idea of this project is to build a full tree constrained by a manually verified seed tree. The tree builder must know how to utilize the prior knowledge provided by human experts. This demand disqualifies any existing softwares. Given this fact, we devised a new algorithm to control the joining step of traditional neighbour-joining. This is origin the constrained neighbour-joining.
When trees are built, they are only meaningful to biologists. Computers generate trees, but they do not understand them. To understand gene trees, a computer must be equipped with some biological knowledges, the species tree. It will teach a computer how to discrim- inate a speciation from a duplication event and how to find orthologs, provided a correct gene tree.
Unfortunately, gene trees are not always correct. Since the advent of UPGMA algorithm in 1958, we have tried to find a ideal model for nearly half a century. But we failed. Evolution is so complex a thing. A model best fits in one lineage might mean a disaster in another. A unified model is far from being discovered. TreeBeST aims at improving the accuracy of tree building, but it does not try to set up a new model in a traditional way. Instead, it integrates two existing models with the help of species tree, finding the subtree that best fits the models and merging them together to build a new tree incorporating the advantages of the both. This is the tree algorithm." (treebest.pdf)
Version
Version 1.9.2 is currently available at OSC.
Availability
treebest is available on the Glenn Cluster.
Usage
On the Glenn Cluster treebest is accessed by executing the following commands:
module load biosoftw
module load treebest
treebest will be added to the users PATH and can then be run with the command:
treebest <command> [options]
Below lists the commands and their summaries for the treebest program.
Command
nj build neighbour-joining tree, SDI, rooting
best build tree with the help of a species tree
phyml build phyml tree
sdi speciation vs. duplication inference
spec print species tree
format reformat a tree
filter filter a multi-alignment
trans translate coding nucleotide alignment
backtrans translate aa alignment back to nt
leaf get external nodes
mfa2aln convert MFA to ALN format
ortho ortholog/paralog inference
distmat distance matrix
treedist topological distance between two trees
pwalign pairwise alignment
mmerge merge a forest
export export a tree to EPS format
subtree extract the subtree
simulate simulate a gene tree
sortleaf sort leaf order
estlen estimate branch length
trimpoor trim out leaves that affect the quality of a tree
root root a tree by minimizing height
Options
treebest nj [options] <input_file>
-c FILE constrained tree(s) in NH format [null]
-m FILE tree to be compared [null]
-s FILE species tree in NH format [default taxa tree]
-l FILE ingroup list file [null]
-t TYPE codon NT: ntmm, dn, ds, dm; AA: mm, jtt, kimura [mm]
ntmm p-distance (codon alignment)
dn non-synonymous distance
ds synonymous distance
dm dn-ds merge (tree merge)
mm p-distance (amino acid alignment)
jtt JTT model (maximum likelihood)
kimura mm + Kimura's correction
-T NUM time limit in seconds [no limit]
-b NUM bootstrapping times [100]
-F NUM quality cut-off [15]
-o STR outgroup for tree cutting [Bilateria]
-S treat the first constrained tree as the original tree
-C use the leaves of constrained trees as ingroup
-M do not apply alignment mask
-N do not mask poorly aligned segments
-g collapse alternative splicing
-R do not apply leaf-reordering
-p the root node is a putative node
-a branch mode that is used by most tree-builder
-A the input alignment is stored in ALN format
-W wipe out root (SDI information will be lost!)
-v verbose output
-h help
treebest best [options] <CDS_alignment>
General Options:
-P skip PHYML
-S ignore the prob. of gene evolution (NOT recommended)
-A apply constraint to PHYML
-C FILE constraining tree [null]
-f FILE species tree [default]
-r discard species that do not appear at all
Output Options:
-D output some debug information
-q suppress part of PHYML warnings
-p STR prefix of intermediate trees [null]
-o FILE output tree [null]
Alignment Preprocessing Options:
-s only build tree for genes from sequenced species
-g collapse alternative splicing forms
-N do not mask low-scoring segments
-F INT quality cut-off [11]
PHYML Related Options:
-c INT number of rate categories for PHYML-HKY [2]
-k FLOAT|e tv/ts ratio (kappa), 'e' for estimatinig [e]
-a FLOAT|e alpha parameter for Gamma distribution [1.0]
-d FLOAT duplication probability [0.15]
-l FLOAT probability of a loss following a speciation [0.10]
-L FLOAT probability of a loss following a duplication [0.20]
-b FLOAT prob. of the presence of an inconsistent branch [0.01]
treebest phyml <alignment> [<tree>]
General Options:
-t task build | opt | loglk | dist [build]
-n the input is a nucleotide alignment
-s print out some statistics
-N do not mask low-scoring segments
-g collapse alternative splicing
-b INT number of bootstraps (slow) [0]
-o FILE write output to file [stdout]
-F INT quality cut-off [15]
Model Related Options:
-m model nt: JC69 | K2P | F81 | HKY | F84 | TN93 | GTR [HKY]
aa: JTT | MtREV | Dayhoff | WAG [WAG]
-c INT number of relative substitution rate categories [1]
-k FLOAT|e transversion/transition ratio, 'e' for estimating [e]
-a FLOAT|e alpha parameter for Gamma distribution [1.0]
-i FLOAT|e proportion of invariable sites [0]
Options for TreeFam Extensions:
-S use a species tree to guide tree building
-f FILE species tree [TreeFam species tree]
-d FLOAT duplication probability [0.15]
-l FLOAT probability of a loss following a speciation [0.10]
-L FLOAT probability of a loss following a duplication [0.20]
-C FILE constraining tree [NULL]
-p FLOAT prob. of the presence of an inconsistent branch [0.01]
treebest sdi [-r|-H|-R|-m <tree0>|-l <spec_list>] <tree>
Options:
-r reroot
-c use core species tree instead of the default one
-H reroot by minimizing tree height, instead of by minimizing the number of duplication events.
-R do not reorder the leaves.
-s FILE species tree [default taxa tree]
-l FILE cut a subtree that contains genes whose species exist in list [null]
-m FILE compare topology with FILE and re-order the leaves [null]
treebest spec
treebest format [-1] <tree>
treebest filter [options] <alignment>
Options:
-n nucleotide alignment
-g collapse alternative splicing
-M do not apply alignment mask
-N do not mask low-scoring segments
-F NUM quality cut-off [15]
treebest trans <nucl_alignment>
treebest backtrans [-t <thres>] <aa_aln> <nt_seq>
treebest leaf <nh_tree>
treebest mfa2aln [-n] <fasta_align>
treebest ortho <tree>
treebest distmat <dn|ds|dm|jtt|kimura|mm|dns> <alignment>
treebest treedist <tree1> <tree2>
treebest pwalign [options] <nt2nt|aa2aa|nt2aa|splice> <seq1> <seq2>
Options :
-f generate full alignment
-a do not apply matrix mean in local alignment
-d just calculate alignment boundaries
-o NUM gap open penalty
-e NUM gap extension penalty
-n NUM gap end penalty for nt2nt or aa2aa
-s NUM frame-shift penalty for aa2nt
-g NUM good splicing penalty
-w NUM band-width
-b NUM bad splicing penalty
-m output miscellaneous information
-h help
treebest mmerge [-r <forest>
Options:
-r reroot
treebest export [options] <tree>
Options:
-x NUM width [640]
-y NUM height [480]
-m NUM margin [20]
-f NUM font size [11]
-b FNUM box size [4.0]
-w FNUM font width [font_size/2]
-s FILE species tree
-B suppress bootstrap value
-M black/white mode
-S show species name
-d speciation/duplication inference
-p pseudo-length
treebest subtree <tree> <list>
treebest simulate [options]
Options:
-d FNUM duplication probability [0.05]
-l FNUM loss probability [0.01]
-p FNUM loss probability after duplication [0.25]
-m FNUM max height [0.25]
-n not show internal name
-h help
treebest sortleaf <tree1> [<tree2>]
treebest estlen <tree> <matrix> <tag>
treebest trimpoor <tree> [<threshold>=0>]
treebest root <tree>
Examples
#PBS -N treebest_test
#PBS -l walltime=00:05:00
#PBS -l nodes=1:ppn=4
cd $PBS_O_WORKDIR
module load biosoftw
module load treebest
cp /usr/local/biosoftw/treebest-1.9.2/examples/ex1.nucl.* .
treebest nj ex1.nucl.mfa > ex1.nucl.1.nhx
cp ex1.nucl.nhx ex1.nucl.1.forest
cat ex1.nucl.1.nhx >> ex1.nucl.1.forest
treebest nj -m ex1.nucl.nhx ex1.nucl.mfa > ex1.nucl.2.nhx
treebest nj -v ex1.nucl.mfa
treebest best ex1.nucl.mfa -o ex1.nucl.3.nhx
treebest best -c 1 -a 0.9 -d 0.14 -l 0.09 -L 0.19 -b 0.009 -o ex1.nucl.4.nhx ex1.nucl.mfa
treebest phyml -o ex1.nucl.1.nh ex1.nucl.mfa
treebest phyml -o ex1.nucl.2.nh ex1.nucl.mfa ex1.nucl.nhx
treebest phyml -s -C ex1.nucl.nhx -o ex1.nucl.4.nh ex1.nucl.mfa
treebest phyml -b 2 -o ex1.nucl.5.nh ex1.nucl.mfa
treebest sdi ex1.nucl.nhx > ex1.nucl.5.nhx
treebest sdi -r ex1.nucl.nhx > ex1.nucl.6.nhx
treebest sdi -r ex1.nucl.nhx > ex1.nucl.7.nhx
treebest spec > all_species.nh
treebest format ex1.nucl.nhx
treebest filter -n -M -N ex1.nucl.mfa > ex1.nucl.1.mfa
treebest trans ex1.nucl.mfa > ex1.aa.mfa
treebest backtrans ex1.aa.mfa ex1.nucl.mfa > ex1.nucl.2.mfa
treebest leaf ex1.nucl.nhx > ex1.nucl.1.leaf
head ex1.nucl.1.leaf | tail -7 > ex1.nucl.1.sublist
treebest mfa2aln -n ex1.nucl.mfa > ex1.nucl.1.aln
treebest ortho ex1.nucl.nhx > ex1.nucl.1.ortho
treebest distmat dn ex1.nucl.mfa > ex1.nucl.1.matrix.dn
treebest distmat ds ex1.nucl.mfa > ex1.nucl.1.matrix.ds
treebest distmat dm ex1.nucl.mfa > ex1.nucl.1.matrix.dm
treebest treedist ex1.nucl.nhx ex1.nucl.1.nhx > ex1.nucl.1.dist
treebest mmerge -r ex1.nucl.1.forest > ex1.nucl.8.nhx
treebest export ex1.nucl.nhx > ex1.nucl.1.eps
treebest subtree ex1.nucl.nhx ex1.nucl.1.sublist > ex1.nucl.9.nhx
treebest simulate > ex1.nucl.6.nh
treebest simulate -d 0.04 -l 0.02 -p 0.5 -m 0.1 > ex1.nucl.7.nh
treebest sortleaf ex1.nucl.nhx > ex1.nucl.sorted.nhx
treebest sortleaf ex1.nucl.nhx ex1.nucl.1.nhx > ex1.nucl.sorted.2.nhx
treebest estlen ex1.nucl.nhx ex1.nucl.1.matrix.ds ds_method > ex1.nucl.1.estlen.ds.nhx
treebest trimpoor ex1.nucl.nhx > ex1.nucl.10.nhx
treebest root ex1.nucl.nhx > ex1.nucl.11.nhx
Documentation
Additional documentation is available at the following website: http://treesoft.sourceforge.net/treebest.shtml and on the Glenn Cluster under /usr/local/biosoftw/treebest-1.9.2/treebest.pdf.
|