Data Analytics and Machine Learning

Data Analytics and Machine Learning Icon

OSC provides hardware and software services to support data analytical need for our clients. Data-intensive workloads can be supported with OSC's high-performance computing frameworks that consist of many programming languages, interactive computing with OnDemand, and parallel computing.

Hardware support

OSC's data analytical environment comprised of nodes with powerful CPU cores, large amount of RAM and local disk space to store and process large amount of data. On Owens, data analytical environment has 16 huge memory nodes (Dell PowerEdge R930 four-socket server with Intel Xeon E5-4830 v3 (Haswell 12 core, 2.10GHz) processors, 1,536GB memory, 12 x 2TB drives). Pitzer has 4 huge memory nodes (Dell PowerEdge R940 four-socket server with Intel Xeon 6148 (Skylake 20 core, 2.40GHz) processors, 3TB memory, 2 x 1TB drives mirrored - 1TB usable).


Infographic about Owens Cluster features.
The Owens Cluster features access via either a terminal (over ssh) or web portal ( The cluster has a peak performance of 706 TF CPU + 750 TF GPU. It uses a 100 Gb/sec Infiniband Network (EDR) connecting shared data storage for home, project, and IME/scratch directories to the compute nodes. There are 648 standard compute nodes, each using Intel Xeon E5-2680 V4 (Broadwell) CPUs with 28 cores, 128 GB memory, and 1.5 TB local disk space per node. There are also 6 debug nodes with a 1 hour walltime limit, 160 GPU nodes that use NVIDIA Pascal P100 GPUs, and 16 large memory nodes, each of which uses Intel Xeon E5-4830 V3 (Haswell) CPUs, 48 cores, with 1.5 TB memory and 24 TB local disk space.

GPU Computing

OSC offers GPU computing on all its systems.  While GPUs can provide a significant boost in performance for some applications the computing model is very different from the CPU. This page discusses some of the ways you can use GPU computing at OSC.

Data Transfer and Storage

Ohio researchers have access to many file storage options at OSC. OSC has over 14 petabytes (PB) of disk storage capacity distributed over several file systems, plus more than 5.5 PB of backup tape storage.

File Transfer

Using our web platform, OnDemand, users can transfer smaller files (<10 GB) using simple drag and drop. Other file transfer options include using sftp from a command line or third-party interface (like Filezilla).

Globus is a simple but powerful transferring service that allows our users to share data with collaborators anywhere! Any remote research sites that run Globus can seamlessly connect to OSC’s many research storage systems. It also connects research systems to personal systems.

Public Dataset

View more about public dataset availability at OSC.

Software Support

Here is a list of software that we offer related to data analytics and machine learning.

A popular general-purpose, high-level programming language with numerous mathematical and scientific packages available for data analytics and machine learning. Python programming environment can be accessed through Jupyter App on OnDemand as well.
A programming language for statistical and machine learning applications with very strong graphical capabilities
RStudio is a free and open-source integrated graphical environment for R. Rstudio is available as OnDemand App with various versions of R.
A full-featured data analysis toolkit with many advanced algorithms readily available. MATLAB is available as an OnDemand App as well.
Big data Frameworks based on memory with distributed storage. Spark is available as OnDemand App as well
Big data Frameworks based on a hard disk with distributed storage
TensorFlow is a free and open-source software library for dataflow and differentiable programming across a range of tasks. It is a symbolic math library and is also used for machine learning applications such as neural networks. 
PyTorch is a python machine learning library based on the Torch library, used for applications such as deep learning and natural language processing.
Horovod is a distributed training framework for TensorFlow, PyTorch, and many more
Intel Compilers:
Compilers for generating optimized code for Intel CPUs.
Intel MKL:
The Math Kernel Library provides optimized subroutines for common computing tasks such as matrix-matrix calculations.  Statistical software: Octave, Stata, FFTW, ScaLAPACK, MINPACK, sprng2
Other statistical softwares:
Octave, Stata, FFTW, ScaLAPACK

Get a complete list of software available at OSC.

Containers at OSC

OSC now supports containers for several applications. More information is provided here.

Getting Started

If you are new to supercomputing, new to OSC, or simply interested in getting an account (if you don't already have one), please see here for further information.