HP Labs Technical Reports
Click here for full text:
Instruction Assignment for Clustered VLIW DSP Compilers: A New Approach
Keyword(s): VLIW; clustering; assignment; ILP; DSP
Abstract: This report proposes a new heuristic/model driven approach to assign nodes of a computational DAG to clusters for a VLIW machine with a partitioned register file. Our approach exploits a heuristically found initial clustering to speed up the convergence of a deterministic descent algorithm. The initial configuration is determined through a longest path driven strategy that collects a number of paths of sub-dags starting from the DAG's leaves. The initial node assignment problem is then simplified to the assignment of these partial components to one of the k clusters. We approach the component assignment problem in two different ways depending upon some heuristically detected DAG symmetries. The descent algorithm starts from the initial configuration and modifies the assignment for each partial component by minimizing a cost function being an estimate of the schedule length for all nodes in the DAG on a given machine. The estimate is carried out by a simplified list scheduler taking quantitatively into account things like register pressure, resources allocation, etc. We compared our approach with a common heuristic known as BUG (Bottom Up Greedy) on a set of scientific and multimedia-like computational kernels. Experimental results show a reduction from 5 to 50% in the static schedule length depending from the DAG's complexity, symmetry and intrinsic parallelism and from architectural parameters like number of clusters, registers banks size, etc. Best results were obtained for large DAGs (hundreds of nodes) where the assignment of nodes to clusters is determinant to reduce the inter-cluster copies and the resource conflicts; another important factor is sometimes the reduction in register spills to/from memory due to the load balancing between clusters. These results and the low computational complexity of this approach show how the proposed method can be a viable solution for node assignment in a VLIW compiler for clustered machines.
Back to Index