This talk introduces NetCheck, a tool to diagnose network problems in large and complex applications. NetCheck uses traces from existing blackbox tracing mechanisms, such as strace, to diagnose network problems in real world applications. NetCheck can diagnose faults without any specific information about the underlying network or application. NetCheck does this by (1) totally ordering the distributed set of input traces, and by (2) utilizing a network model to identify points in the totally ordered execution where the traces deviated from the behavior a programmer is likely to expect. The key insight in this work is to perform network problem diagnosis by understanding how the programmer expects the network to operate and look for differences in the observed behavior.
Our evaluation demonstrates that NetCheck is able to accurately diagnose failures without relying on any application- or network-specific information. For instance, NetCheck correctly identified the existence of NAT devices, simultaneous network disconnection/reconnection, and platform portability issues. In a more targeted evaluation, we have found thatNetCheck correctly detects over 95% of the network problems reported in popular projects like Python, Apache, and Ruby. When applied to traces of faults observed by a network administrator in a live network, NetCheck identified the primary cause of the fault in 90% of the cases. NetCheck performs diagnosis efficiently and can process a GB-long trace in about 2 minutes.
I will also give an overview of the Computer Science and Engineering department at NYU and discuss opportunities for PhD students, interns, and full time developer positions in New York City.