One of the most popular approaches to the flow calculation problem is to use
diffusion based algorithms [6,14].
In a heat diffusion process,
the initial uneven temperature distribution in space causes the movement
of heat, and the system eventually reaches a steady-state temperature.

The diffusion algorithm, as described in [6], is given as follows.
At each iteration
of the algorithm, processor will send
an amount proportional to the difference between its load
and its neighbor's load,
, to its neighbor .
Assume
, the new load of the processor
is given by the combination of
its own load and contributions from/to its neighboring vertices, namely

(10)

Initially the load for vertex is .
In matrix form, the above equation can be rewritten as

(11)

where is a diagonal matrix of the size ,
that consists of the coefficients ,
and is a matrix of size , to be defined in
Section 4.3.4.
For the choice of the coefficients, Boillat [6] suggested

The diffusion algorithm, being a stationary iterative method of the
form (11), can converge quite slowly on graphs with
small connectivity. Boillat [6]
proved that the worst case happens when the graph is, say, a line, and in such a case
the number of iterations
needed to reach a given tolerance is
with the number of vertices. There are other variations of the diffusion
algorithm (see [33,28]). For example instead of (11), a special case of
the following equation,

is solved in [33]. The convergence of the diffusion algorithm
can also be improved using the Chebyshev polynomial
[44,17,64].

Many investigations of dynamic load balancing algorithms have used a diffusive approach,
although the details vary. For example,
in both the tiling algorithm and the iterative tree balancing algorithm
[13,25,26], a processor selects
amongst its neighbors the one with the highest load and posts a request.
In the tiling algorithm the amount of load to be sent is decided by
looking at the average of the loads in the neighborhood. In
the iterative tree balancing algorithm the requests are viewed
as a forest of trees. The flow along the
branches of the tree is then
calculated using a logarithmic time parallel scan operation.