Algorithms for load balancing of unstructured mesh based applications have seen rapid development in recent years. However, some problems remain.
Although packages such as JOSTLE and ParMETIS can partition and load balance large meshes very rapidly [83,70], the scalability of these algorithms remains a concern, particularly for applications where the load on each processor changes frequently. As pointed out in , these algorithms use only integer operations, the number of which is proportional to the size of the subdomain boundary, with no ``chunky'' floating point operations to ``hide behind''. Also they may not be load balanced and the communication cost may dominate. These factors make it difficult to achieve high scalability.
Almost all algorithms use edge-cut as the cost function for minimization. But this is not necessarily appropriate for all applications. There are a number of limitations of edge-cut based cost functions [38,12]. First of all, edge-cut is not necessarily an accurate measure of the communication time. Factors such as the number of neighbors, the size of the largest inter-processor boundary among all subdomains, as well as the start up cost of the communications all come into play. Second, for some applications, there may be more than one objective that needs to be minimized.
For applications where domain decomposition type solvers or preconditioners are used, factors such as aspect ratio of the subdomains affect the convergence, through the interface problem [22,23]. There have been some studies of special partitioning algorithms that take this into account [16,84]. In , the reduction of the aspect ratio of subdomains was achieved through the migration of vertices that were far away from the geometric center of the subdomain. In , the aspect ratio was simplified to a localized function of the interface surface areas, which was associated with the edges of the dual graph. The problem of minimizing the aspect ratio of the subdomains was transformed to that of minimizing the sum of edge weights cut by the partitioning, which can be dealt with in a similar way to minimizing the edge-cut. In both studies the minimization of aspect ratio is balanced against the minimization of edge-cut so that the communication cost resulting from the partitioning will be kept in check. It remains to be seen that these aspect ratio based partitioning algorithms do indeed improve the convergence of the solvers.
In other applications such as multi-physics calculations involving solids and liquids, the computation is carried out in phases and the mesh that is partitioned to balance the load of one phase of computation does not necessarily balance the other phases. Conventional graph partitioning algorithms can be viewed as optimization algorithms that minimize the edge-cut, subject to the load balancing constraint. For multi-physics calculations there is a need to accommodate two or more load balancing constraints. In  this multi-constraints optimization problem was looked at and a procedure suggested to solve the problem heuristically. This involved the modification of the different stages of the multi-level K-L like algorithm. In particular, in the graph coarsening stage the criterion for matching vertices was a combination of heavy edge weights and a balanced sum of vertices weights, assuming that each vertex has a vector of weights associated with it that represents the amount of work on this vertex during different phases of the computation. For the refinement stage the K-L algorithm was modified to take into account the need for minimal edge-cut and to satisfy the load balancing constraints, by favoring the migration of vertices that will improve the edge-cut as well as the load balance of the partitioning. The natural extension of the above is to multi-objective optimization problem, where instead of minimizing the edge-cut, other objectives such as aspect ratio and load balancing for multi-phase computation are included.
The limitations of the edge-cut based model for graph partitioning and efforts to overcome them have resulted in modifications of some of the more popular software packages. For example in the recent version of METIS (Version 4.01), a number of new features have been added. These include the ability to handle multi-constraints, the minimization of the total number of boundary vertices and the minimization of the maximum connectivity of subdomains.
Apart from modifications of the standard undirected graph based algorithms, alternative graph models have also been suggested to represent the underlying problems more accurately. A multilevel algorithm based on hypergraphs has been suggested to minimize communication involved in symmetric and unsymmetric matrix vector multiplications on parallel platforms , while bipartite graphs have been used as a better model for the problem of minimizing communication cost in parallel iterative algorithms and preconditioners for unsymmetric systems .
The actual migration step of the dynamic load balancing process also deserves further investigation. Few large scale applications that have full dynamic load balancing capability exist (but see, e.g., )), due to the difficulties involved with regards to the data structures and book-keeping, in actually moving entities such as meshes, fields and other physical quantities around the processors. During the dynamic load balancing process, the mesh on each processor will have to be split and part of the mesh, together with the field, will need to be sent to the appropriate neighboring processors. At the same time each processor will receive meshes and fields and combine them with what is left to form a single entity. There can be a lot of book keeping involved and such tasks are best fulfilled with a code written in an object oriented manner (see, e.g., ).