Parallel processing is the use of multiple processors that are coordinated to work on the same problem.



Hybrids are possible. Having a little shared memory can eliminate the message passing needs.
An operating system must have...
Machine independence is a requirement. However, this is difficult to achieve in order to maximize the performance of a particular architecture.
Research has centered on transforming sequential programs to parallel versions tailored to the architecture.
FOR i:=1 TO n DO FOR j:=1 TO n DO c[i,j] := 0; FOR k:=1 TO n DO c[i,j] := c[i,j] + a[i,k]*b[k,j]; END; END; END;Achieving parallelism can be done explicitly or possibly as a compiler option. The latter is would be preferred so legacy programs can be automatically parallelized. But this requires sophisticated compilers, ones that can "understand" the algorithm and recognize possible parallel constructs. These may work sometimes but not always find all the parallelisms.
This example indicates via compiler options, points at which parallelism can be applied.
integer a(100,100),b(100,100),c(100,100) integer i,j,k, nprocs, err nprocs = 10 C code to read in matrices a and b goes here err = m_set_procs(nprocs) C$doacross share(a,b,c), local(j,k) do 10 i=1,100 do 10 j=1,100 c(i,j) = 0 do 10 k=1,100 c(i,j) = c(i,j) + a(i,k)*b(k,j) 10 continue C output matrix c here endAnother approach is a UNIX style fork/join creation of processes as in
#include <parallel.h>
#define SIZE 100
#define NPROCS 10
shared int a[SIZE][SIZE], b[SIZE][SIZE], c[SIZE][SIZE];
void main(void)
{ int err;
void multiply();
/* read in a and b here */
m_fork(multiply);
m_kill_procs();
/* output c here */
}
void multiply(void)
{int i,j,k;
for(i=m_get_myid(); i<SIZE; i+= NPROCS)
for(j=0; j<SIZE; j++){
c[i][j] = 0;
for(k=0; k<SIZE; k++)
c[i][j] += a[i][k]*b[k][j];
}
}
parbegin s1; s2; ... sn; parend; /*Parallelize the following code.*/ BEGIN
i := 1;
read(k);
j := 0;
WHILE (k <> 0) and (i < 11) DO
BEGIN
m[i] := k;
i := i+1;
read(k)
END;
j := i-1;
IF j<>0 THEN
BEGIN
writeln(j);
i := 0;
a := 0.0;
WHILE(i<=j) DO
BEGIN
IF m[i]<0 THEN m[i] := -m[i];
writeln(m[i]);
a := a + m[i];
i := i + 1
END;
writeln(a/j)
END
END.
Also a doparallel statement along with other similar parallel-providing structures allow control at the statement level.
x:= newprocess(p); ... ... killprocess(x);
if(fork() ==0) { /* code for child */} else { /* code for parent*/}
#define SIZE 100
#define NPROCS 10
int a[SIZE][SIZE], b[SIZE][SIZE], c[SIZE][SIZE];
void main(void)
{ int myid;
/* read in a and b here */
for(myid=0; myid<NPROCS; myid++)
if(fork()==0){
multiply(myid);
exit();
}
}
for(myid=0; myid<NPROCS; myid++) wait(0);
/* output c here */
}
void multiply(int myid)
{int i,j,k;
for(i=myid; i<SIZE; i+= NPROCS)
for(j=0; j<SIZE; j++){
c[i][j] = 0;
for(k=0; k<SIZE; k++)
c[i][j] += a[i][k]*b[k][j];
}
}
Easy to do - Context switching (and statement, fork() function, task creation)
Hardware virtual memory makes it easy for operating system to create independently executing tasks in their own address space
Hard to do - Synchronization. How to pass information reliably among a set of independently executing parallel tasks.
What is the output that is printed?
I = 1; parbegin; I = I+1; I = I-1; write(I); parend; write(I);
Both first and second write can be 0, 1 or 2. Why?
(1) I = I+1 and (2) I = I-1 and (3) write(I);
(4) write(I)
How to get second write of 0:
(1) I=I+1 is not a single operation. It is usually 3 instructions: Load from I, Add 1, store into I.
What if context switch between instructions 1 and 2?
(2) I=I-1 will set I to 0 then context switch back to (1) causes original value of I to be incremented to 2, which is printed as <2,2>
If I=I-1 is executed first, we could get <0,0>, etc.
A critical region is a sequence of program statements within a task where the task is operating on some data object shared with other tasks.
Must use some mechanism to access this data:
Mutual exclusion - Only one process can be executing a given section of code at one time.
Example again:
I = 1;
parbegin
begin block(A); I = I+1; wakeup(A) end;
begin block(A); I = I-1; wakeup(A) end;
begin block (A); write(I); wakeup(A) end;
parend
write(I)
Output: Second write is always I=1, but first write can be 0, 1 or 2, depending upon which and statement executes first.
Process A:
Block(X); Block(Y); ... Do something
Wakeup(X); Wakeup(Y);
Process B:
Block(Y); Block(X); ... Do something
Wakeup(X); Wakeup(Y)
If process A does Block(X) at same time process B does Block(Y), then
Both will succeed and then
Both will wait for the other resource forever
An unlikely occurrence, but it CAN happen.
Rendezvous in Ada:
Either waits for other to reach this point.
accept DataReady do-- Entry point for task for sender to do call DataReady-- process synchronization taskend;
How to prevent receiver of task to be blocked-
Use guarded if (select statement) ...
select when Device1Status = ON => accept Ready1 do . . . end; or when Device2Status = ON => accept Ready2 do . . . end; or when Device3Status = connected => accept Ready3 do . . . end;else . . . -- No device is ready; do somethingend select;
rendezvous - Task waits for either of Device 1, 2, or 3, and if none are ON, does something else
A monitor is a shared data object together with the set of operations that may manipulate it (i.e., an abstract data type)
A task may manipulate the shared data object only by using the defined operations (i.e., data is encapsulated)
To enforce mutual exclusion, require that at most one of the operations defined for the data object may be executing at any given time.
We could implement monitors in Java or C++ as classes explicitly.
Synchronized classes implement monitors for you in Java.