Since the end of the exponential clock speed increase circa 2003, parallelism has imposed itself as the means to obtain speed-ups on general purpose platforms but the current fragile languages, programming abstractions, and compiler and runtime support have stifled its adoption. I will present the problems that make the state of the art parallel languages (and libraries) unsuitable for the general-purpose application programmer. Ideally, we would like all the available parallelism to be expressed by the programmer using high level programming constructs and delegate its efficient execution to the compiler and runtime. Besides sidestepping the tedious and fragile process of fine-tuning the granularity of parallelism, the maximal availability of parallelism affords the scheduler maximal flexibility during the execution when more information (e.g., core availability) is accessible.
I will propose Lazy Scheduling as one piece of the solution, which can be applied to several popular schedulers, most notably work-stealing. It optimizes the common case where most of the parallel tasks are executed locally and do not need to be made available to other processors. Lazy scheduling achieves this goal by using a simple heuristic for inferring whether other processors are hungry for work, and by making parallel work available to other processors based on that heuristic. Unlike previous attempts, Lazy Scheduling does not introduce potential deadlocks and does not make irrevocable serialization decisions that could hurt load-balancing and performance. Moreover, I will show how Lazy Scheduling is able to greatly outperform current work-stealing when a lot of parallelism is exposed by the programmer, and how little performance is lost compared to a carefully tuned program.
Alexandros Tzannes is a Post-Doctoral researcher at the University of Illinois at Urbana-Champaign working with Prof. Vikram Adve on provably safe parallelism. He obtained his Ph.D from the University of Maryland, College Park, under the supervision of Rajeev Barua and Uzi Vishkin, where he wrote the compiler for the Maryland XMT parallel architecture and contributed a user-level run-time task scheduler that adapts to changing load conditions. This scheduler achieves similar performance with extremely fine-grained tasks as state-of the art schedulers do with tasks that have been coarsened by an order of magnitude. He interests focus on improving the ease of programming and performance portability of parallel code through compiler, run-time, and PL techniques.