Runtime System
Principal Investigators: Sarita Adve, Sanjay Kale, Rakesh Kumar, Craig Zilles
A major challenge in developing software for client platforms is hardware diversity. It is untenable to ask software vendors to adapt or optimize their programs for each of these platforms. Instead, we believe it is important to provide an execution environment that attempts to meet the applications goals the best it can given the available resources on the platform. Two key concepts in this statement are worth emphasizing:
Application goals: We believe that quality of service (QoS) will be increasingly important on client systems to provide a good user experience. Many performance-hungry applications can be written so as to provide the best answer that can be computed by a given deadline, and will be written this way to be responsive without jitter and long pauses. We expect applications to be annotated and organized such that a level of output quality can be selected based on the available resources.
Available resources: We expect heterogeneity in client platforms. Not only will there be heterogeneity between platforms (different design/price points within a process generation and across process generations), but also within a platform. We expect future platforms to include a variety of cores (a few large cores, optimized for latency, for sequential performance and many small cores, optimized for throughput, for parallel workloads); even when designed to be similar, process variation will endow them with different performance characteristics. Furthermore, the resources that can be applied to each program's execution may vary over time as applications are launched or complete and due to adaptation of the hardware to physical constraints (e.g., power, temperature, battery life, and aging).
The process of trying to maximize utility (the sum of the user benefits of all running programs) given the available resources is an optimization problem. Drawing on our previous work, we use a combination of task over-decomposition and a hierarchical adaptive resource allocation strategy for an efficient solution to this problem. Our proposed run-time system derives a number of important ideas from our previous work on the Charm++ run-time. By having the programmer over-decompose into significantly more work units than the expected number of hardware threads, the Charm++ run-time can both achieve good performance across a large range of core counts and tolerate heterogeneity in the core capabilities. Key to achieving these are its adaptive load balancing which optimizes communication by co-locating communicating work units.
The hierarchical structure of this system (global optimization by the O/S, optimization within an allocation by the run-time) and optimization at multiple time scales was motivated by our previous work in the GRACE project. That work demonstrated these techniques to build a cross-layer adaptive system which could adapt hardware, network, O/S, and application algorithms to minimize power consumption while still meeting QoS requirements.