Computing has been an enormous accelerator to science and industry alike and it has led to an information explosion in many different fields. The unprecedented volume of data acquired by sensors, derived by simulations and analysis processes, and shared on the Web opens up new opportunities, but it also creates many challenges when it comes to managing and analyzing these data. In this talk, I discuss the importance of maintaining detailed provenance (also referred to as lineage and pedigree) for digital data. Provenance provides important documentation that is key to preserve data, to determine the data's quality and authorship, to understand, reproduce, as well as validate results. Besides presenting techniques we have developed to efficiently manage and re-use provenance information, I will give an overview of the provenance infrastructure we have built for the open-source VisTrails system (http://www.vistrails.org). I will also describe emerging applications and novel uses of provenance for enabling collaborative data analysis, teaching science, and publishing reproducible results.
Juliana Freire is a Professor of Computer Science at the Polytechnic Institute of New York University (NYU Poly). An important theme is Professor Freire's work is the development of data management technology to address new problems introduced by emerging applications, including the Web and e-Science. Her research interests include provenance, scientific data management, information integration and Web mining. She is a recipient of an NSF CAREER and an IBM Faculty award. Her research has been funded by the National Science Foundation, Department of Energy, National Institutes of Health, the University of Utah, IBM, Microsoft and Yahoo!