Parallel Programming

last updated 5/1/06
 

Parallel processing is the use of multiple processors that are coordinated to work on the same problem.

Forms of parallelisms

 

 

 

 


Taxonomy of computer architectures

The underlying architecture unfortunately has a significant impact on the approach to parallel programming and languages.
The von Neumann architecture is heavily reflected in imperative/procedural languages and much of OO and functional languages.

Flynn's Taxonomy.

 

 

 

 

 

Shared memory vs Distributed memory

 

Hybrids are possible. Having a little shared memory can eliminate the message passing needs.

 

 

 

 


Operating System responsibilities

...in support of parallelism

An operating system must have...

  1. a means of creating and destroying processes (even in a single processor situation).
  2. a means of managing the number of processors used by processes
  3. a means of assuring mutual exclusion in the case of share memory for process communication and synchronization.
  4. A mechanism for creating and maintaining communication channels between processes for communication and synchronization.
 

 

 

 


Parallel Programming Languages

Need to provide mechanisms to support process/thread creation, synchronization and communication.

Machine independence is a requirement. However, this is difficult to achieve in order to maximize the performance of a particular architecture.

Research has centered on transforming sequential programs to parallel versions tailored to the architecture.

Matrix Multiplication Example

FOR i:=1 TO n DO
   FOR j:=1 TO n DO
      c[i,j] := 0;
      FOR k:=1 TO n DO
          c[i,j] := c[i,j] + a[i,k]*b[k,j];
      END;
   END; 
END;
Achieving parallelism can be done explicitly or possibly as a compiler option. The latter is would be preferred so legacy programs can be automatically parallelized. But this requires sophisticated compilers, ones that can "understand" the algorithm and recognize possible parallel constructs. These may work sometimes but not always find all the parallelisms.

This example indicates via compiler options, points at which parallelism can be applied.

      integer a(100,100),b(100,100),c(100,100)
      integer i,j,k, nprocs, err
      nprocs = 10
C code to read in matrices a and b goes here
      err = m_set_procs(nprocs)
C$doacross share(a,b,c), local(j,k)
      do 10 i=1,100
        do 10 j=1,100
          c(i,j) = 0
          do 10 k=1,100
            c(i,j) = c(i,j) + a(i,k)*b(k,j)
10    continue
C output matrix c here
      end
Another approach is a UNIX style fork/join creation of processes as in
#include <parallel.h>
#define SIZE 100
#define NPROCS 10

shared int a[SIZE][SIZE], b[SIZE][SIZE], c[SIZE][SIZE];

void main(void)
{ int err;
  void multiply();
/* read in a and b here */
  m_fork(multiply);
  m_kill_procs();
/* output c here */
}
void multiply(void)
{int i,j,k;
   for(i=m_get_myid(); i<SIZE; i+= NPROCS)
     for(j=0; j<SIZE; j++){
        c[i][j] = 0;
        for(k=0; k<SIZE; k++)
            c[i][j] += a[i][k]*b[k][j];
        }
}
 

 

 

 

 


Process Creation and Destruction

Parbegin-parend blocks
parbegin
    s1;
    s2;
...
    sn;
parend;

/*Parallelize the following code.*/
BEGIN
i := 1;
read(k);
j := 0;
WHILE (k <> 0) and (i < 11) DO
BEGIN
m[i] := k;
i := i+1;
read(k)
END;
j := i-1;
IF j<>0 THEN
BEGIN
writeln(j);
i := 0;
a := 0.0;
WHILE(i<=j) DO
BEGIN
IF m[i]<0 THEN m[i] := -m[i];
writeln(m[i]);
a := a + m[i];
i := i + 1
END;
writeln(a/j)
END
END.

 

Also a doparallel statement along with other similar parallel-providing structures allow control at the statement level.

x:= newprocess(p);
...
...
killprocess(x);
if(fork() ==0)
   { /* code for child */}
else 
   { /* code for parent*/} 
#define SIZE 100
#define NPROCS 10
int a[SIZE][SIZE], b[SIZE][SIZE], c[SIZE][SIZE];

void main(void)
{ int myid;
/* read in a and b here */
  for(myid=0; myid<NPROCS; myid++)
     if(fork()==0){
        multiply(myid);
        exit();
     }
}
  for(myid=0; myid<NPROCS; myid++) wait(0);
/* output c here */
}
void multiply(int myid)
{int i,j,k;
  for(i=myid; i<SIZE; i+= NPROCS)
    for(j=0; j<SIZE; j++){
      c[i][j] = 0; 
      for(k=0; k<SIZE; k++)
         c[i][j] += a[i][k]*b[k][j];
      }
}
  

 


Issues in synchronization

Easy to do - Context switching (and statement, fork() function, task creation)

Hardware virtual memory makes it easy for operating system to create independently executing tasks in their own address space

Hard to do - Synchronization. How to pass information reliably among a set of independently executing parallel tasks.

What is the output that is printed?

I = 1;
parbegin;
   I = I+1; I = I-1; write(I);
parend;
write(I);

Both first and second write can be 0, 1 or 2. Why?

 

 


Parallel soup

(1) I = I+1 and (2) I = I-1 and (3) write(I);

(4) write(I)

How to get second write of 0:

(1) I=I+1 is not a single operation. It is usually 3 instructions: Load from I, Add 1, store into I.

What if context switch between instructions 1 and 2?

(2) I=I-1 will set I to 0 then context switch back to (1) causes original value of I to be incremented to 2, which is printed as <2,2>

If I=I-1 is executed first, we could get <0,0>, etc.

 

 


Critical regions

A critical region is a sequence of program statements within a task where the task is operating on some data object shared with other tasks.

Must use some mechanism to access this data:

 

 

 


Mutual exclusion

Mutual exclusion - Only one process can be executing a given section of code at one time.

Example again:

I = 1;
parbegin
   begin block(A); I = I+1; wakeup(A) end;
    begin block(A); I = I-1; wakeup(A) end;
    begin block (A); write(I); wakeup(A) end;
parend
write(I)

Output: Second write is always I=1, but first write can be 0, 1 or 2, depending upon which and statement executes first.

 

 


Problem: Deadlock

Process A:

Block(X); Block(Y); ... Do something
Wakeup(X); Wakeup(Y);

Process B:

Block(Y); Block(X); ... Do something
Wakeup(X); Wakeup(Y)

If process A does Block(X) at same time process B does Block(Y), then

Both will succeed and then

Both will wait for the other resource forever

An unlikely occurrence, but it CAN happen.

 

 


Synchronization in Ada

Rendezvous in Ada:

Either waits for other to reach this point.

accept DataReady do
-- Entry point for task for sender to do call DataReady
-- process synchronization task
end;

How to prevent receiver of task to be blocked-

Use guarded if (select statement) ...

select
     when Device1Status = ON => accept Ready1 do . . .
           end;
or when Device2Status = ON => accept Ready2 do . . .
              end;
or when Device3Status = connected => accept Ready3 do
      . . . end;
else . . . -- No device is ready; do something
end select;

rendezvous - Task waits for either of Device 1, 2, or 3, and if none are ON, does something else

 


Monitors

A monitor is a shared data object together with the set of operations that may manipulate it (i.e., an abstract data type)

A task may manipulate the shared data object only by using the defined operations (i.e., data is encapsulated)

To enforce mutual exclusion, require that at most one of the operations defined for the data object may be executing at any given time.

We could implement monitors in Java or C++ as classes explicitly.

Synchronized classes implement monitors for you in Java.

 

 

 

 


Parallelism in non-procedural languages

LISP and Prolog in particular.