Spark

Apache Spark is an open source cluster-computing framework originally developed in the AMPLab at University of California, Berkeley but was later donated to the Apache Software Foundation where it remains today. In contrast to Hadoop's disk-based analytics paradigm, Spark has multi-stage in-memory analytics. Spark can run programs up-to 100x faster than Hadoop’s MapReduce in memory or 10x faster on disk. Spark support applications written in python, java, scala and R

Availability and Restrictions

Versions

The following versions of Spark are available on OSC systems:

Version	Pitzer	Ascend	Cardinal
2.4.0	X*
2.4.5	X
3.5.1		X	X

* Current default version

You can use module spider spark to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.

Access

Spark is available to all OSC users. If you have any questions, please contact OSC Help.

Publisher/Vendor/Repository and License Type

The Apache Software Foundation, Open source

Usage

Set-up

In order to configure your environment for the usage of Spark, run the following command:

module load spark

A particular version of Spark can be loaded as follows

module load spark/2.4.0

Using Spark

In order to run Spark in batch, reference the example batch script below. This script requests 6 node on the Owens cluster for 1 hour of walltime. The script will submit the pyspark script called test.py using slurm-spark-submit (prevously pbs-spark-submit) command.

#!/bin/bash 
#SBATCH --job-name ExampleJob 
#SBATCH --nodes=2 --ntasks-per-node=48 
#SBATCH --time=01:00:00 
#SBTACH --account your_project_id

module load spark

cp test.py $TMPDIR
cd $TMPDIR 

slurm-spark-submit test.py  > test.log

cp * $SLURM_SUBMIT_DIR

slurm-spark-submit script is used for submitting Spark jobs. For more options, please run,

slurm-spark-submit --help

Running Spark interactively in batch

To run Spark interactively, but in batch on Owens please run the following command,

 sinteractive -N 2 -n 28 -t 01:00:00

When your interactive shell is ready, please launch spark cluster using the slrum-spark-submit script

slurm-spark-submit

You can then launch pyspark by connecting to Spark master node as follows.

pyspark --master spark://nodename.ten.osc.edu:7070

Launching Jupyter+Spark on OSC OnDemand

Instructions on how to launch Spark on OSC OnDemand web interface is here. https://www.osc.edu/content/launching_jupyter_spark_app

Custom Spark Property values

When launching a Spark application on Ondemand, users can provide a path to a custom property file that replaces Spark's default configuration settings. This allows for greater customization and optimization of Spark's behavior based on the specific needs of the application.

However, it's important to note that before setting the configuration using a custom property file, users should ensure that there are enough resources on the cluster to handle the requested configuration.

Example of custom property file:spark_custom.conf

spark.executor.instances 2 
spark.executor.cores 2 
spark.executor.memory 60g 
spark.driver.memory 2g

Users can check the default property values or the values after loading the custom property file as follows

spark.sparkContext.getConf().getAll()

Search form

Spark

Availability and Restrictions

Versions

Access

Publisher/Vendor/Repository and License Type

Usage

Set-up

Using Spark

Running Spark interactively in batch

Launching Jupyter+Spark on OSC OnDemand

Custom Spark Property values

Further Reading

See Also

Client Resources

Upcoming Events

Recent News

Translate

Ohio Department of Higher Education

State Government Links

Education Links

Search form

You are here

Spark

Availability and Restrictions

Versions

Access

Publisher/Vendor/Repository and License Type

Usage

Set-up

Using Spark

Running Spark interactively in batch

Launching Jupyter+Spark on OSC OnDemand

Custom Spark Property values

Further Reading

See Also

Client Resources

Upcoming Events

Recent News

Translate

Ohio Department of Higher Education

State Government Links

Education Links