bowtie
Introduction
"Bowtie is an ultrafast, memory-efficient short read aligner geared toward quickly aligning large sets of short DNA sequences (reads) to large genomes. It aligns 35-base-pair reads to the human genome at a rate of 25 million reads per hour on a typical workstation. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: for the human genome, the index is typically about 2.2 GB (for unpaired alignment) or 2.9 GB (for paired-end or colorspace alignment). Multiple processors can be used simultaneously to achieve greater alignment speed. Bowtie can also output alignments in the standard SAM format, allowing Bowtie to interoperate with other tools supporting SAM, including the SAMtools consensus, SNP, and indel callers. Bowtie runs on the command line." (http://bowtie-bio.sourceforge.net/manual.shtml)
Version
Version 0.12.7 is currently available at OSC.
Availability
bowtie is available on the Glenn Cluster.
Usage
On the Glenn Cluster bowtie is accessed by executing the following commands:
module load biosoftw
module load bowtie
bowtie will be added to the users PATH and can then be run with the command:
bowtie [options]* <ebwt> {-1 <m1> -2 <m2 | --12 <r> | <s>} [<hit]
Below are definitions for some of the main optional arguments:
<m1> Comma-separated list of files containing upstream mates (or the sequences themselves, if -c is set) paired with mates in <m2>
<m2> Comma-separated list of files containing downstream mates (or the sequences themselves if -c is set) paired with mates in <m1>
<r> Comma-separated list of files containing Crossbow-style reads. Can be a mixture of paired and unpaired. Specify "-" for stdin.
<s> Comma-separated list of files containing unpaired reads, or the sequences themselves, if -c is set. Specify "-" for stdin.
<hit> File to write hits to (default: stdout)
Options
Input:
-q query input files are FASTQ .fq/.fastq (default)
-f query input files are (multi-)FASTA .fa/.mfa
-r query input files are raw one-sequence-per-line
-c query sequences given on cmd line (as <mates>, <singles>)
-C reads and index are in colorspace
-Q/--quals <file> QV file(s) corresponding to CSFASTA inputs; use with -f -C
--Q1/--Q2 <file> same as -Q, but for mate files 1 and 2 respectively
-s/--skip <int> skip the first <int> reads/pairs in the input
-u/--qupto <int> stop after first <int> reads/pairs (excl. skipped reads)
-5/--trim5 <int> trim <int> bases from 5' (left) end of reads
-3/--trim3 <int> trim <int> bases from 3' (right) end of reads
--phred33-quals input quals are Phred+33 (default)
--phred64-quals input quals are Phred+64 (same as --solexa1.3-quals)
--solexa-quals input quals are from GA Pipeline ver. < 1.3
--solexa1.3-quals input quals are from GA Pipeline ver. >= 1.3
--integer-quals qualities are given as space-separated integers (not ASCII)
Alignment:
-v <int> report end-to-end hits w/ <=v mismatches; ignore qualities or
-n/--seedmms <int> max mismatches in seed (can be 0-3, default: -n 2)
-e/--maqerr <int> max sum of mismatch quals across alignment for -n (def: 70)
-l/--seedlen <int> seed length for -n (default: 28)
--nomaqround disable Maq-like quality rounding for -n (nearest 10 <= 30)
-I/--minins <int> minimum insert size for paired-end alignment (default: 0)
-X/--maxins <int> maximum insert size for paired-end alignment (default: 250)
--fr/--rf/--ff -1, -2 mates align fw/rev, rev/fw, fw/fw (default: --fr)
--nofw/--norc do not align to forward/reverse-complement reference strand
--maxbts <int> max # backtracks for -n 2/3 (default: 125, 800 for --best)
--pairtries <int> max # attempts to find mate for anchor hit (default: 100)
-y/--tryhard try hard to find valid alignments, at the expense of speed
--chunkmbs <int> max megabytes of RAM for best-first search frames (def: 64)
Reporting:
-k <int> report up to <int> good alignments per read (default: 1)
-a/--all report all alignments per read (much slower than low -k)
-m <int> suppress all alignments if > <int> exist (def: no limit)
-M <int> like -m, but reports 1 random hit (MAPQ=0); requires --best
--best hits guaranteed best stratum; ties broken by quality
--strata hits in sub-optimal strata aren't reported (requires --best)
Output:
-t/--time print wall-clock time taken by search phases
-B/--offbase <int> leftmost ref offset = <int> in bowtie output (default: 0)
--quiet print nothing but the alignments
--refout write alignments to files refXXXXX.map, 1 map per reference
--refidx refer to ref. seqs by 0-based index rather than name
--al <fname> write aligned reads/pairs to file(s) <fname>
--un <fname> write unaligned reads/pairs to file(s) <fname>
--max <fname> write reads/pairs over -m limit to file(s) <fname>
--suppress <cols> suppresses given columns (comma-delim'ed) in default output
--fullref write entire ref name (default: only up to 1st space)
Colorspace:
--snpphred <int> Phred penalty for SNP when decoding colorspace (def: 30) or
--snpfrac <dec> approx. fraction of SNP bases (e.g. 0.001); sets --snpphred
--col-cseq print aligned colorspace seqs as colors, not decoded bases
--col-cqual print original colorspace quals, not decoded quals
--col-keepends keep nucleotides at extreme ends of decoded alignment
SAM:
-S/--sam write hits in SAM format
--mapq <int> default mapping quality (MAPQ) to print for SAM alignments
--sam-nohead supppress header lines (starting with @) for SAM output
--sam-nosq supppress @SQ header lines for SAM output
--sam-RG <text> add <text> (usually "lab=value") to @RG line of SAM header
Performance:
-o/--offrate <int> override offrate of index; must be >= index's offrate
-p/--threads <int> number of alignment threads to launch (default: 1)
--mm use memory-mapped I/O for index; many 'bowtie's can share
--shmem use shared mem for index; many 'bowtie's can share
Other:
--seed <int> seed for random number generator
--verbose verbose output (for debugging)
--version print version information and quit
-h/--help print this usage message
Example
#PBS -n bowtie_test
#PBS -l walltime=00:05:00
#PBS -l nodes=1:ppn=4
module load biosoftw
module load bowtie-0.12.7
cd $PBS_O_WORKDIR
cp /usr/local/biosoftw/bowtie-$BOWTIE_VERSION/genomes/NC_008253.fna .
bowtie-build NC_008253.fna e_coli
bowtie –p 4 e_coli -c ATGCATCATGCGCCAT
Errors
The following scripts fail due to an ftp error: make_e_coli.sh, make_a_thaliana_tair.sh, and make_c_elegans_ws200.sh. The following scripts fail to obtain all of the fasta format files prior to bowtie conversion and fail: make_galGal3.sh, make_hg18.sh, make_h_sapiens_ncbi36.sh, make_h_sapiens_ncbi37.sh, make_mm9.sh, make_m_musculus_ncbi37.sh. The follow script does not work properly on the Glenn Cluster: gen_dnamasks2colormask.pl.
Documentation
http://bowtie-bio.sourceforge.net/manual.shtml
|