OSC provides hardware and software services to support data analytical need for our clients. Data-intensive workloads can be supported with OSC's high-performance computing frameworks that consist of many programming languages, interactive computing with OnDemand, and parallel computing.
OSC's data analytical environment comprised of nodes with powerful CPU cores, large amount of RAM and local disk space to store and process large amount of data. On Owens, data analytical environment has 16 huge memory nodes (Dell PowerEdge R930 four-socket server with Intel Xeon E5-4830 v3 (Haswell 12 core, 2.10GHz) processors, 1,536GB memory, 12 x 2TB drives). Pitzer has 4 huge memory nodes (Dell PowerEdge R940 four-socket server with Intel Xeon 6148 (Skylake 20 core, 2.40GHz) processors, 3TB memory, 2 x 1TB drives mirrored - 1TB usable).
OSC offers GPU computing on all its systems. While GPUs can provide a significant boost in performance for some applications the computing model is very different from the CPU. This page discusses some of the ways you can use GPU computing at OSC.
Data Transfer and Storage
Ohio researchers have access to many file storage options at OSC. OSC has over 14 petabytes (PB) of disk storage capacity distributed over several file systems, plus more than 14 PB of available backup tape storage (with the ability to easily expand to over 23PB).
Using our web platform, OnDemand, users can transfer smaller files (<10 GB) using simple drag and drop. Other file transfer options include using sftp from a command line or third-party interface (like Filezilla).
Globus is a simple but powerful transferring service that allows our users to share data with collaborators anywhere! Any remote research sites that run Globus can seamlessly connect to OSC’s many research storage systems. It also connects research systems to personal systems.
Here is a list of software that we offer related to data analytics and machine learning.
- A popular general-purpose, high-level programming language with numerous mathematical and scientific packages available for data analytics and machine learning. Python programming environment can be accessed through Jupyter App on OnDemand as well.
- A programming language for statistical and machine learning applications with very strong graphical capabilities
- RStudio is a free and open-source integrated graphical environment for R. Rstudio is available as OnDemand App with various versions of R.
- A full-featured data analysis toolkit with many advanced algorithms readily available. MATLAB is available as an OnDemand App as well.
- Big data Frameworks based on memory with distributed storage. Spark is available as OnDemand App as well
- Big data Frameworks based on a hard disk with distributed storage
- TensorFlow is a free and open-source software library for dataflow and differentiable programming across a range of tasks. It is a symbolic math library and is also used for machine learning applications such as neural networks.
- PyTorch is a python machine learning library based on the Torch library, used for applications such as deep learning and natural language processing.
- Horovod is a distributed training framework for TensorFlow, PyTorch, and many more
- Intel Compilers:
- Compilers for generating optimized code for Intel CPUs.
- Intel MKL:
- The Math Kernel Library provides optimized subroutines for common computing tasks such as matrix-matrix calculations. Statistical software: Octave, Stata, FFTW, ScaLAPACK, MINPACK, sprng2
- Other statistical softwares:
- Octave, Stata, FFTW, ScaLAPACK
Containers at OSC
OSC now supports containers for several applications. More information is provided here.
If you are new to supercomputing, new to OSC, or simply interested in getting an account (if you don't already have one), please see here for further information.