It is now more than a decade since parallel architectures started conquering the computing landscape, both in large-scale systems and in the desktop system level with the prevalence of multicore processors. The parallel systems "jungle" nowadays contains new, specialized "species", such as multicore accelerations (in the form of multicore DSPs, CUDA/OpenCL GPUs, etc) as well as multicore embedded systems that may support specific operations. Typically, such devices are characterized by limited resources and rather complex, specialized programming requirements. In addition, they usually constitute part of a larger system which is orchestrated by one or more general-purpose multicore processors. If parallel programming for a plain multicore processor is considered an involved task, programming a heterogeneous system with a mix of different compute units is well beyond the abilities of the average programmer.
In this talk we will introduce OpenMP, which is now considered the most popular programming model for shared-memory multicore systems. This is mostly due to its simplicity which allows programmers to express and achieve parallelism in a highly intuitive manner. With its recent version (4.x) it aims to provide a similarly accessible model for programming contemporary heterogeneous systems that contain a main processor (host) and one or more co-processors or accelerators (devices), where all of them may consist of multiple cores. We will show the compilation process for OpenMP-based applications, using the OMPi compiler and its source-to-source transformations as a concrete example. We will also discuss the organization of the runtime system, which provides support during the execution of an application, mostly considering architectures with limited resources. Implementation details will be provided for the popular Parallella board, a credit-card sized embedded platform which consists of a dual-core ARM host and a 16/64-core Epiphany co-processor. Finally, a new runtime organization technique named CARS (compiler-assisted runtime system) will be presented. This novel technique relies on specialized analysis of the application code on the compiler side in order to discover and embed only the portions of the runtime infrastructure that are absolutely necessary for the operation of the particular application. This results in memory consumption and performance gains which are in some cases quite impressive.
Vassilios Dimakopoulos is an Associate Professor in the Department of Computer Science and Engineering, University of Ioannina, where he is currently serving as Deputy Chairman and as Director of Graduate Studies. He holds a Diploma from the Department of Computer Engineering and Informatics, University of Patras and the M.A.Sc. and Ph.D. degrees from the Department of Electrical and Computer Engineering, University of Victoria, Canada. His work evolves around parallel and distributed systems, in areas such as interconnection networks, collective communications, performance analysis, p2p systems, among others. Lately, his research is focused primarily on parallel programming models and their efficient compilation and runtime support for general-purpose systems as well as embedded and accelerator-based platforms. With his research group he has developed OMPi, a popular, open-source compiler which is used widely for research and experimentation on the OpenMP parallel programming model.