This paper introduces packet chaining, a simple and effective method to increase allocator matching efficiency and hence network performance, particularly suited to networks with short packets and short cycle times. Packet chaining operates by chaining packets destined to the same output together, to reuse the switch connection of a departing packet. This allows an allocator to build up an efficient matching over a number of cycles like incremental allocation, but not limited by packet length. For a 64-node 2D mesh at maximum injection rate and with single-flit packets, packet chaining increases network throughput by 15% compared to a highly-tuned router using a conventional single-iteration separable iSLIP allocator, and outperforms significantly more complex allocators. Specifically, it outperforms multiple-iteration iSLIP allocators and wavefront allocators by 10% and 6% respectively, and gives comparable throughput with an augmenting paths allocator. Packet chaining achieves this performance with a cycle time comparable to a single-iteration separable allocator. Packet chaining also reduces average network latency by 22.5% compared to a single-iteration iSLIP allocator. Finally, packet chaining increases IPC up to 46% (16% average) for application benchmarks because short packets are critical in a typical cache-coherent chip multiprocessor.
George Michelogiannakis is finishing his PhD studies at Stanford University. His thesis is focusing on energy-efficient flow control for on-chip networks. It evaluates bufferless flow control and proposes elastic buffer flow control to provide network buffering with minimal cost and without the complications of bufferless networks, by using pipeline flip-flops for storage. He has also investigated hierarchical on-chip networks for large-scale chip multiprocessors. His last work focuses on increasing allocation efficiency in network routers to reach or exceed wavefront and augmenting paths, without extending the delay path.