Manufacturing processors with aggressive CMOS topologies paves the way to more efficient and powerful computers, but also introduces new challenges to the computer industry. As the architecture of processors evolves towards massively multi-core paradigm, the efficient design and use of Network On Chips (NoCs) is paramount, but the exploitation of such devices is hampered by increasing variability and defect rates. In the field of supercomputers, the energy consumption and cost of processor chips decreased so much that the Off-Chip network now takes a very significant share of the supercomputer cost, energy and application latencies.
Inside chips, we then propose the adoption of more complex routing algorithms to better exploit the processor resources in the presence of defects. This approach increases the interconnect throughput by up to a factor 10 in the presence of 20 % of defects. We also propose a method for applications to deploy dynamically their task across cores while avoiding faulty cores, avoiding less efficient cores, and minimizing the expected energy consumption. Finally, a few elements of the NoC simulation model that was developed for this thesis are explained, with a focus on its graphical visualization features.
Outside chips, our focus is on the proposition of topologies allowing better performances, while reducing the interconnect cost. Hence, the team at NII contributed several randomly-generated topologies that take advantage of the small world effect and the physical clustering of nodes to both reduce the network diameter and cabling cost. While this approach is very effective for latency-bound applications, throughput-bound applications are generally best run on a meshed or hierarchical topology. Another issue is that supercomputers are often built in more than one stage, and hence the cabling should be incremental too, which is far from trivial for high-radix topologies.
Hence, we propose a method for cabling multiple topologies in a more efficient and incremental way, thus enabling a more agile deployment of the Off-Chip interconnect in supercomputers.
Fabien Chaix received an engineer degree in Electronics and Computer systems as well as a master degree in Micro-electronics from the Grenoble University, France in 2008. In 2013, he defended his PhD. thesis entitled "Contributions for late CMOS many-core processors: NoC fault-tolerant routing and auto-adaptive applications" at the TIMA laboratory, Grenoble, France. He is currently a Post-doctorate researcher in the Computer Architecture division of the National Institute of Informatics in Tokyo, Japan. His research interests include fault-tolerant designs and systems as well as On- and Off- chip networks for embedded and high-performance computing.