Hadoop

A hadoop cluster can be launched within the HPC environment, but managed by the PBS/slurm job scheduler using  Myhadoop framework developed by San Diego Supercomputer Center. (Please see https://www.grid.tuc.gr/fileadmin/users_data/grid/documents/hadoop/Krish...)

Availability and Restrictions

Versions

The following versions of Hadoop are available on OSC systems: 

Version Owens
3.0.0-alpha1 X*
* Current default version

You can use module spider hadoop to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.

Access

Hadoop is available to all OSC users. If you have any questions, please contact OSC Help.

Publisher/Vendor/Repository and License Type

Apache software foundation, Open source

Usage

Set-up

In order to configure your environment for the usage of Hadoop, run the following command:

module load hadoop

In order to access a particular version of Hadoop, run the following command

module load hadoop/3.0.0-alpha1

Using Hadoop

In order to run Hadoop in batch, reference the example batch script below. This script requests 6 node on the Owens cluster for 1 hour of walltime. 

#!/bin/bash
#SBATCH --job-name hadoop-example
#SBATCH --nodes=6 --ntasks-per-node=28
#SBATCH --time=01:00:00
#SBATCH --account <account>

export WORK=$SLURM_SUBMIT_DIR
module load hadoop/3.0.0-alpha1
module load myhadoop/v0.40
export HADOOP_CONF_DIR=$TMPDIR/mycluster-conf-$SLURM_JOBID

cd $TMPDIR

myhadoop-configure.sh -c $HADOOP_CONF_DIR -s $TMPDIR
$HADOOP_HOME/sbin/start-dfs.sh
hadoop dfsadmin -report
hadoop  dfs -mkdir data
hadoop  dfs -put $HADOOP_HOME/README.txt  data/
hadoop  dfs -ls data
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha1.jar wordcount data/README.txt wordcount-out
hadoop  dfs -ls wordcount-out
hadoop  dfs  -copyToLocal -f  wordcount-out  $WORK
$HADOOP_HOME/sbin/stop-dfs.sh
myhadoop-cleanup.sh

Example Jobs

Please check /usr/local/src/hadoop/3.0.0-alpha1/test.osc folder for more examples of hadoop jobs

Further Reading

See Also

Supercomputer: 
Service: 
Fields of Science: