Darshan
Darshan is a lightweight "scalable HPC I/O characterization tool
Availability and Restrictions
Versions
The following versions of Darshan are available on OSC clusters:
Darshan is a lightweight "scalable HPC I/O characterization tool
The following versions of Darshan are available on OSC clusters:
Apache Spark is an open source cluster-computing framework originally developed in the AMPLab at University of California, Berkeley but was later donated to the Apache Software Foundation where it remains today. In contrast to Hadoop's disk-based analytics paradigm, Spark has multi-stage in-memory analytics. Spark can run programs up-to 100x faster than Hadoop’s MapReduce in memory or 10x faster on disk. Spark support applications written in python, java, scala and R.
From WARP3D's webpage:
R is a language and environment for statistical computing and graphics. It is an integrated suite of software facilities for data manipulation, calculation, and graphical display. It includes
More information can be found here.
Bowtie2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
Linaro HPC tools analyze how HPC software runs. It consists of three applications, Linaro DDT, Linaro Performance Reports and Linaro MAP:
STAR: Spliced Transcripts Alignment to a Reference.
The following versions of STAR are available on OSC clusters:
Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.The selection of trimming steps and their associated parameters are supplied on the command line.
The following versions of Trimmomatic are available on OSC clusters:
SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of variants on genes (such as amino acid changes).
The following versions of SnpEff are available on OSC clusters:
The Sequence Read Archive (SRA Toolkit) stores raw sequence data from "next-generation" sequencing technologies including 454, IonTorrent, Illumina, SOLiD, Helicos and Complete Genomics. In addition to raw sequence data, SRA now stores alignment information in the form of read placements on a reference sequence. Use SRA Toolkit tools to directly operate on SRA runs.
The following versions of SRA Toolkit are available on OSC clusters: