Published works confirm that parallelism at the data level is widely accepted as the most important performance leverage for the efficient execution of embedded media and telecom applications and has been exploited via a number of approaches the most efficient being vector/SIMD architectures. A further, complementary and substantial form of parallelism exists at the thread level but this has not been researched to the same extent in the context of embedded workloads. For the efficient execution of such applications, exploitation of both forms of parallelism is of paramount importance. This calls for a new architectural approach in the software-hardware interface as its rigidity, manifested in all desktop-based and the majority of embedded CPU's, directly affects the performance of vectorized, threaded codes.
The author advocates a holistic, mature approach where parallelism is extracted via automatic means while at the same time, the traditionally rigid hardware-software interface is optimized to match the temporal and spatial behaviour of the embedded workload. This ultimate goal calls for the precise study of these forms of parallelism for a number of applications executing on theoretical models such as instruction set simulators and parallel RAM machines as well as the development of highly parametric microarchitectural frameworks to encapsulate that functionality.
Vassilios A. Chouliaras was born in Athens, Greece in 1969. He received a B.Sc. in Physics and Laser Science from Heriot-Watt University, Edinburgh in 1993, a M.Sc. in VLSI Systems Engineering from UMIST in 1995 and a Ph.D. (publications) from Loughborough University, UK in 2005. He has worked as an independent software engineer and digital hardware designer, as an ASIC design engineer for Intracom SA and as a senior R&D Engineer/processor architect for ARC International. Currently, he is a lecturer in the Department of Electronic and Electrical Engineering at the University of Loughborough, UK where he is leading the Electronic System Design Group research in embedded CPUs, Instruction set architecture design and System-on-Chip modeling. His research interests include superscalar and vector CPU microarchitecture, high-performance embedded CPU implementations, performance modeling, custom instruction set design and self-timed design.