A hadoop cluster can be launched within the HPC environment, but managed by the PBS job scheduler using Myhadoop framework developed by San Diego Supercomputer Center. (Please see http://www.sdsc.edu/~allans/MyHadoop.pdf)
Mathematics & Statistics
Apache Spark is an open source cluster-computing framework originally developed in the AMPLab at University of California, Berkeley but was later donated to the Apache Software Foundation where it remains today. In contrast to Hadoop's disk-based analytics paradigm, Spark has multi-stage in-memory analytics. Spark can run programs upto 100x faster than Hadoop’s MapReduce in memory or 10x faster on disk. Spark support applications written in python, java, scala and R
R is a language and environment for statistical computing and graphics. It is similar to the S language and environment developed at Bell Laboratories (formerly AT&T, now Lucent Technologies). R provides a wide variety of statistical and graphical techniques, and is highly extensible.
Octave is a high-level language, primarily intended for numerical computations. It provides a convenient command line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with Matlab. It may also be used as a batch-oriented language.
Octave has extensive tools for solving common numerical linear algebra problems, finding the roots of nonlinear equations, integrating ordinary functions, manipulating polynomials, and integrating ordinary differential and differential-algebraic equations. It is easily extensible and customizable via user-defined functions written in Octave's own language, or using dynamically loaded modules written in C++, C, Fortran, or other languages.
SuiteSparse is a suite of sparse matrix algorithms, including: UMFPACK(multifrontal LU factorization), CHOLMOD(supernodal Cholesky, with CUDA acceleration), SPQR(multifrontal QR) and many other packages.
OSC is refreshing the software stack on Oakley on September 15, 2015 (during the scheduled downtime); something we have not done since Oakley entered service in 2012. During the software refresh, some default versions are updated to be more up-to-date and some older versions are removed. Information about the old and new default versions, as well as all available versions of each software package will be included on the corresponding OSC software webpage. See https://www.osc.edu/supercomputing/software-list.
SuperLU is a library for the direct solution of large, sparse, nonsymmetric systems of linear equations on high performance machines. It comes in two different flavors: SuperLU_MT (multithreaded) for shared memory parallel machines and SuperLU_DIST for distributed memory parallel machines.
Stata is a complete, integrated statistical package that provides everything needed for data analysis, data management, and graphics. Release 13 32-processor SMP is currently available at OSC.
Computational stochastic approaches (Monte Carlo methods) based on random sampling are becoming extremely important research tools not only in their "traditional" fields such as physics, chemistry or applied mathematics but also in social sciences and, recently, in various branches of industry. An indication of importance is, for example, the fact that Monte Carlo calculations consume about one half of the supercomputer cycles. One of the indispensable and important ingredients for reliable and statistically sound calculations is the source of pseudo random numbers. SPRNG provides a scalable package for parallel pseudo random number generation which will be easy to use on a variety of architectures, especially in large-scale parallel Monte Carlo applications.
SPRNG 1.0 provides the user the various SPRNG random number generators each in its own library. For most users this is acceptable, as one rarely uses more than one type of generator in a single program. However, if the user desires this added flexibility, SPRNG 2.0 provides it. In all other respects, SPRNG 1.0 and SPRNG 2.0 are identical.
ScaLAPACK is a library of high-performance linear algebra routines for clusters supporting MPI. It contains routines for solving systems of linear equations, least squares problems, and eigenvalue problems.
This page documents usage of the ScaLAPACK library installed by OSC from source. An optimized implementation of ScaLAPACK is included in MKL; see the software documentation page for Intel Math Kernel Library for usage information.