Hadoop

A hadoop cluster can be launched within the HPC environment, but managed by the PBS job scheduler using  Myhadoop framework developed by San Diego Supercomputer Center. (Please see http://www.sdsc.edu/~allans/MyHadoop.pdf)

Availability & Restrictions

Hadoop is available to all OSC users without restriction.

The following versions of Hadoop are available on OSC systems: 

VERSION

OAKLEY

OWENS

3.0.0*

 

X

 

NOTE: * means it is the default version.

Set-up

In order to configure your environment for the usage of Hadoop, run the following command:

module load hadoop

In order to access a particular version of Hadoop, run the following command

module load hadoop/3.0.0-alpha1

Using Hadoop

In order to run Hadoop in batch, reference the example batch script below. This script requests 6 node on the Owens cluster for 1 hour of walltime. 

#PBS -N hadoop-example

#PBS -l nodes=6:ppn=12

#PBS -l walltime=01:00:00

setenv WORK $PBS_O_WORKDIR

module load hadoop/3.0.0-alpha1

module load myhadoop/v0.40

setenv HADOOP_CONF_DIR $TMPDIR/mycluster-conf-$PBS_JOBID

cd $TMPDIR

myhadoop-configure.sh -c $HADOOP_CONF_DIR -s $TMPDIR

$HADOOP_HOME/sbin/start-dfs.sh

hadoop dfsadmin -report

hadoop  dfs -mkdir data

hadoop  dfs -put $HADOOP_HOME/README.txt  data/

hadoop  dfs -ls data

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha1.jar wordcount data/README.txt wordcount-out

hadoop  dfs -ls wordcount-out

hadoop  dfs  -copyToLocal -f  wordcount-out  $WORK

$HADOOP_HOME/sbin/stop-dfs.sh

myhadoop-cleanup.sh

Example Jobs

Please check /usr/local/src/hadoop/3.0.0-alpha1/test.osc folder for more examples of hadoop jobs

Further Reading

See Also

Service: 
Fields of Science: