From mobile and cloud apps to video games to driverless vehicle control, more and more software is time-constrained: it must deliver reliable results seamlessly, consistently, and virtually instantaneously. If it doesn't, customers are unhappy--and sometimes lives are put at risk. When complex software underperforms or fails, identifying the root causes is difficult and, historically, few tools have been available to help, leaving application developers to guess what might be happening. How can we do better? The key is to have low-overhead observation tools that can show exactly where all the elapsed time goes in both normal responses and in delayed responses. Doing so makes visible each of the seven possible reasons for such delays, as we show.
Richard L. Sites wrote his first computer program in 1959 and has spent most of his career at the boundary between hardware and software, with a particular interest in CPU/software performance interactions. His past work includes VAX microcode, DEC Alpha co-architect, and inventing the performance counters found in nearly all processors today. He has done low-overhead microcode and software tracing at DEC, Adobe, Google, and Tesla. Dr. Sites earned his PhD at Stanford in 1974; he holds 66 patents and is a member of the US National Academy of Engineering.