Launching Jupyter +Spark App

 This Knowledge Base article is based on an online workshop developed by the Ohio Supercomputer Center (OSC). It has been expanded with commentary and some helpful hints to enable people to complete it on their own. If you already have an account at OSC you can use it for this tutorial. This tutorial will demonstrate how to use pyspark interface to Spark through Jupyter notebook on OSCOndemand.

Launching Jupyter+Spark App

 

Log on to https://ondemand.osc.edu/ with your OSC credentials. Choose Jupyter+Spark app from the Interactive Apps option.

 

OSC OnDemand Dashboard

 

Provide job submission parameters and click Launch. Please make sure to check Include access to OSC tutorial/workshop notebooks

 

J+S Launch

The next page shows the status of your job either as Queued or Starting or Running

Job status-Q

 

 

When the job is ready, please click on Open tutorial folder option

Open-tutorial

 

You will see a file called pyspark_tutorials.ipynb. Please select the checkbox next to the the filename and then click on duplicate to make a copy of the file.

 

dup

 

 You will see that a new file named pyspark_tutorials-Copy1.ipynb is created.

 

duplicate

 

Double-clicking on the pyspark_tutorials-Copy1.ipynb file will launch Jupyter interface for Spark to proceed with the tutorials.

 

pyspark

 

 

You can go through each cell and execute commands to see the results. Click on the cell, then either press Shift + Enter  Or Run tab to execute a cell. When a cell is being executed, you will see * appearing as In [*] on the left side of the cell. When the execution is completed, you will see results below the cell. 

When you are done with the exercise, close the Jupyter tabs and delete the job by clicking Delete.

 

DELETE