|
Using the MATLAB Parallel Computing Toolbox at OSCThe following contains a tutorial on use of the MATLAB® Parallel Computing Toolbox™. This tutorial assumes that you have the MATLAB Parallel Computing Toolbox installed on your desktop computer and have configured it for use at OSC. For instructions on configuring the MATLAB Parallel Computing Toolbox for use at OSC, click here. What is the MATLAB Parallel Computing Toolbox? When should I use the MATLAB Parallel Computing Toolbox? For task parallel problems, there are several constructs available. For example, parfor loops are often mentioned in the documentation as a way to run independent iterations of a for loop in parallel. However, the OSC interface does not currently support matlabpool jobs, which are required for parfor loops. Alternatives include for-drange() loops for parallel iterations of a loop and dfeval() and dfevalasync() for parallel iterations of a function. For data parallel problems, the MATLAB Parallel Computing Toolbox uses “codistributed arrays,” or arrays that are distributed across multiple labs. Many MATLAB functions are overloaded so that they also operate on codistributed arrays. For instance, one can add (+), multiply (*), subtract (-), or divide (\) codistributed arrays. Functions such as eig(), svd(), fft(), min(), max(), and find() are also available for codistributed arrays, as well as many others. See “Using MATLAB Functions on Codistributed Arrays” in the MATLAB documentation for a full list of functions that are compatible with codistributed arrays. The following shows how to use OSC’s interface to the MATLAB Parallel Computing Toolbox, with examples of simple task parallel and data parallel problems. Setting up the OSC interface
This will open a ‘Manage Configurations’ menu:
This menu should show a configuration called ‘OSC Opteron.’ If it does not, follow the directions at Configuring Toolbox. You can switch between this configuration and the ‘local’ configuration, which is useful for developing and debugging code locally on your desktop. You must also be connected to glenn.osc.edu. To connect to Glenn, you can issue the command: conn = ssh('your_OSC_username','glenn.osc.edu') at the MATLAB prompt. This should return the lines: Host: glenn.osc.edu A simple task parallel example using dfeval() and dfevalasync() Take as an example the function meshgrid(). This function generates X and Y arrays for 3-D plots. Let’s say we want to generate arrays using the following pairs of vectors: x1 = [1 2 3 4 5]; y1 = [1 2 3 4 5]; We can do this in parallel using the function dfeval(): [X, Y] = dfeval(@meshgrid, {{x1 y1}, {x2 y2}, {x3 y3}}, 'Configuration', 'OSC Opteron'); For dfeval(), we store the outputs in two variables, [X, Y]. For the inputs, we specify a function handle for meshgrid(); a cell array of cells, each containing an input argument pair for meshgrid(); and we set ‘Configuration’ to ‘OSC Opteron.’ Note that once we execute this command, the Command Window will enter a ‘Busy’ state while dfeval() runs. When it finishes, the outputs should be cell arrays containing the results. In this case, pairs of outputs can be accessed as: X{1}; Y{1}; We can perform the same operation using the function dfevalasync(): job = dfevalasync(@meshgrid, 2, {{x1 y1}, {x2 y2}, {x3,y3}}, 'Configuration', 'OSC Opteron'); Here, we specify as the output a job object that lets us check the state of the job. For the inputs, we specify a function handle for meshgrid(), the number of outputs this function returns for each set of inputs, a cell array of cells listing the input argument pairs for meshgrid(), and we set ‘Configuration’ to ‘OSC Opteron.’ Since dfevalasync() does not put the Command Window into a ‘Busy’ state, we need to check to see when the job is finished. This can be done by periodically running the command: get(job,'State') This function returns either ‘queued,’ ‘running,’ or ‘finished.’ Once the function returns as‘finished,’ the outputs can be retrieved using the command: outputs = getAllOutputArguments(job); In this case, outputs will be a cell array in which each row corresponds to a given set of inputs, e.g.: X1 = outputs{1,1}; Once the outputs have been retrieved, extra job files that were created by dfevalasync() can be deleted by issuing the command: destroy(job) Using pmode pmode start local 4 at the MATLAB prompt. This will open a “Parallel Command Window” with four parallel processes. Note that you do not have to have four processors on your desktop in order to run four parallel processes. The resulting Parallel Command Window is shown below:
Here, we have issued two simple commands at the P>> parallel command prompt at the bottom of the Parallel Command Window: a = 1; A list of issued commands is shown on the left-hand side of the Parallel Command Window. Both commands were evaluated on each of the four processes, referred to as Labs 1 through 4. Results for each individual lab are shown in each window. Note that rand() produces a different random number on each lab. The following sections provide an introduction to basic parallel programming in pmode. This includes communication, the use of codistributed arrays, and task parallelism using a for-drange() loop. Finally, we show how a program developed locally in pmode can be modified to run on OSC’s systems. Communication in pmode The labSend() function is used to send data from one lab to another, and labReceive() is used to receive this data. The labSend() function is non-blocking, i.e. the lab sending the data can continue to execute commands even if the receiving lab has not yet received the data. The labReceive() command is blocking, i.e. it blocks execution on the receiving lab until the data is received. An optional integer-valued tag can be used with these commands to assign a number to each message. The labProbe() function can be used to check the status of a message from a particular lab or with a particular tag. The labBroadcast() function is used to send data from one lab to all labs. Each lab is blocked until it receives the data. There is also a labBarrier() function that can be used to block execution of each lab until all labs reach the labBarrier() statement. This function is not often necessary, but can be useful in certain situations. Use of these functions may require knowing the total number of labs and each lab’s individual number. The numlabs() function returns the total number of labs. The labindex() function returns the lab index. Here, we will show how to use the labSend(), labReceive(), labBroadcast(), numlabs(), and labindex() functions. A sample script “Comm_test.m” contains the commands we will be using. First, let us do a simple communication test between labs 1 and 2. To do this, we can paste the following code into the Parallel Command Prompt: % Generate a random number 'a' on lab 1 and send it to lab 2. This code is run on all the labs. The first part of this code uses the labindex() function as part of an if statement to run commands on lab 1 only. This portion of the code generates a random value a on lab 1 and uses labSend() to send a to lab 2. The second part of this code uses labindex() as part of an if statement to run commands on lab 2 only. This portion of the code uses a labReceive() function to receive a message from lab 1 on lab 2 and store the result in a. Note that all of these commands must be issued at the Parallel Command Prompt at the same time. Otherwise, the labSend() command fails because there is no corresponding labReceive(). The resulting Parallel Command Window is shown below. As the image shows, labs 1 and 2 both have the same value for a after the labSend() and labReceive() functions have completed. Labs 3 and 4 did not execute either of the commands.
Next, we show how to use the labSend() function in a for loop to send a random value stored in b from lab 1 to labs 2 through 4. To do this, we can paste the following code into the Parallel Command Prompt: % Generate a random number 'b' on lab 1 and sends it to all other labs using a 'for' loop The result is shown below. As the image shows, all four labs have the same random value stored in b once the for loop finishes.
Finally, we will show how to use the labBroadcast() command to broadcast a random value c1 from lab 1 to all other labs. The result is stored on each lab in the variable c. To do this, we can paste the following code into the Parallel Command Prompt: % Generate a random number 'c' on lab 1 and broadcast it to all other labs using labBroadcast The result is shown below. As the image shows, all four labs have the same random value stored in c.
For more information on these and similar functions, see “Interlab Communication Within a Parallel Job” in the MATLAB documentation. Global operations in pmode As an example, suppose we generate a random value a on each lab using the command: a = rand Each lab will have a different value for a. We can find the minimum value across all labs and return the result to each lab using the command: res1 = gop(@min, a) We can add all the values of a across the labs and return the result to lab 2 using the command: res2 = gop(@plus, a, 2) The corresponding Parallel Command Window is shown below:
Codistributed arrays in pmode dist = codistributor() The first is the default distribution scheme, in which an array is distributed by columns across all labs (unless the array is a column array, in which case the array is distributed by rows). The second allows the user to specify the dimension in which the distribution occurs. The last distributes an array in blocks of size p-by-p, with m*p rows and n*p columns. This applies only to two-dimensional arrays. There are other options for more precisely specifying the desired distribution. See the documentation for codistributor() for more information. There are several ways to create a codistributed array. For instance, zeros(), rand(), and randn() all work with a codistributor object: A = zeros(4,8,codistributor()) The resulting Parallel Command Window is shown below:
Codistributed arrays can also be generated from local arrays on each lab using the codistributed() command. For instance, take the commands: localA = rand(2,3); The first two commands create an 8-by-3 array distributed by rows. The second two commands create a 6-by-6 array distributed by both rows and columns. The resulting Parallel Command Window is shown below:
Task parallelism in pmode using drange() A = round(100*rand(50,100,codistributor())); Let us suppose that we want to make a matrix B with each entry equal to the largest prime factor of the corresponding entry of A. By default, A is distributed by columns. Thus, we will have each lab process its portion of the entries, moving down its own columns of A using a for-drange() loop. We also need to initialize B before beginning the process: B = round(zeros(50,100,codistributor())); We can see a portion of the results by checking a portion of the local parts of A and B: LA = localPart(A); The resulting Parallel Command Window should look something like the one shown below:
Note that communication between labs is not allowed in a for-drange() loop, and during the loop, each lab only has access to its own portion of any codistributed arrays. For more information, see “Using a for-Loop Over a Distributed Range (for-drange)” in the MATLAB documentation. Submitting jobs to Glenn function [OutputA, OutputB] = primeFactors() The body of the function is: A = round(100*rand(50,100,codistributor())); Here, we have changed the outputs slightly. When we submit this code as a job, it will use some specified number of labs. Each lab will return outputs. To avoid repeating the same data, only lab 1 will have actual copies of A and B. We could submit this using dfeval() or dfevalasync(). We could also do this by creating a scheduler object and submitting a job. First, we must create a scheduler using the command: sched = findResource('scheduler', 'type', 'generic', 'Configuration', 'OSC Opteron'); Next, we must create a job and set its properties: job = createParallelJob(sched); The first line creates a basic job object. The next line specifies the location, on your desktop, of all files needed to run the function. The third line specifies the path, on your desktop, where files needed to run the function are located. The last two lines specify the maximum and minimum number of workers or labs to use. Here, both have been set to 4, so we should get exactly 4 labs. Finally, we create a task that specifies the function we wish to run: task = createTask(job, @primeFactors, 2, {}); The function createTask() takes at least four arguments: the first is a job object, the second is a handle to the function to run, the third is the number of output arguments this function returns, and the last is a list of input arguments. Note that this list could be made of multiple cell arrays, as with dfeval(). However, here we only want to run one instance of the function primeFactors(), and it does not take any inputs. To submit the job, we use the command: submit(job) To check the state of the job, use the command: get(job,’State’) Once this command returns the state ‘finished,’ we can get the outputs and destroy extra job files using the commands: outputs = getAllOutputArguments(job); Once we have stored the output, we can destroy any extra job files using the command: destroy(job); Each of the four labs returns outputs for the function, so outputs is a 4-by-2 cell array. However, we only really saved the matrices on lab 1, so we can retrieve the results using the command: OutputA = outputs{1,1}; Debugging Jobs errmsgs = get(job.Tasks, {'ErrorMessage'}); You can also check any logs from the scheduler: sched = findResource('scheduler', 'type', 'generic', 'Configuration', 'OSC Opteron') Retrieving Data from Glenn function status = primeFactors() We can run this code and get the results using commands similar to those used before: sched = findResource('scheduler', 'type', 'generic', 'Configuration', 'OSC Opteron'); Once the job has completed successfully, outputs will be a cell array full of “1s.” The actual data will be stored on Glenn. To log onto Glenn, you can use a file transfer utility such as the Filezilla client (http://filezilla-project.org). See Configuring Toolbox for instructions on setting up Filezilla on Glenn. Once you have logged in to Filezilla, you can find the files in your home directory (the default) on Glenn, as shown below:
You can also log onto Glenn using an SSH client and manipulate the data there using a variety of available programs. See http://www.osc.edu/supercomputing/computing/opt/index.shtml for further information on general use of Glenn. Clearing Old Job Data from Glenn and Your Desktop Note that if you remember to destroy the job after you are done, you should not need to delete any files on Glenn.
|











