As mentioned before, the solution of the system of linear equations (9) is not unique. The motivation of the algorithm , called the method of potentials here for reasons explained later, is to choose amongst these solutions one that minimizes the data movement. For this section (Section 4.3.4), the processor graph is assumed to be directed, with the direction of each edge going from the vertex with higher index to the one with lower index.
Let be the matrix associated with the linear system (9), let be the vector of 's and the right hand side of (9). Assuming that the Euclidean norm of the data movement is used as a measure of data movement, and that the unit communication cost between any two processors and is . Let denote the diagonal matrix of size consisting of the , then one needs to find a solution which solves the following minimization problem:
Here is the matrix associated with system (9).
It is called the vertex-edge incident matrix [68,8] of the directed
Applying the necessary condition for the constrained optimization problem
(see ), after some algebraic manipulation, one
Thus the problem of finding an optimal load balancing schedule is transformed to that of solving the linear equation (14). Once the Lagrange multipliers are found, then the load transfer vector is .
For any graph, each row of the
matrix only has two non-zeros, and , corresponding to the
head and tail vertices of the edge .
Therefore the amount of load to be transferred from processor to processor
(assuming is the head and the tail), along the edge ,
The matrix is a generalized form of the Laplacian matrix (6). For many parallel computers the unit communication cost is roughly the same between any two processors (), in which case this matrix is the same as (6).
The linear system (14) can be solved by many standard numerical algorithms. The conjugate gradient algorithm was used in  because it is simple, easy to parallelize and converges quickly. For preconditioning, the diagonal of the Laplacian may be used.
As a simple example, considering the processor graph in Figure 14.
The load for each processor is
given in brackets. The average load is 16.25 and the largest load imbalance is
The Laplacian system (assuming a weighting of ) is now
The solution of this linear equation is
Table 2 gives the result of the method of potentials and that of the diffusion algorithm on random graphs of varying connectivity. The algorithms have been implemented on a Cray T3D with MPI. It can be seen that in general the method of potentials is faster, and that for graphs of small connectivity it is considerably faster. The same conclusion can be drawn when applying the two algorithms on processor graphs from real applications .
When examining the Euclidean norm of migrating flow (Figure 16), it was surprising to find that the norm for the diffusion algorithm was usually as small as that for the method of potentials, and much smaller than that of algorithms such as the dimension exchange algorithm. It has now been proved  that diffusion type algorithms also have a minimal norm property. In fact it is possible to prove a rather general result  that if a flow calculation algorithm is based on the polynomial of the weighted Laplacian, where is a -th order polynomial such that , then the flow generated, provided that the algorithm converges, solves (12). In other words, all polynomial based flow calculation algorithms, including the diffusion algorithm (11), generate exactly the same minimal flow as the algorithm of potentials, provided that the same edge weights 's are used. In terms of the rate of convergence, the algorithm of potentials is of course better. It also converges for any positive edge weights.