Studying the characteristics of image and video data can lead to a higher understanding of the environment and offer a natural interface between users and their surroundings. However, the massive amounts of data and the associated complexity encumber the transfer of sophisticated vision algorithms to real life systems. One approach for addressing these issues is to generate compact and descriptive representations of image data. In this talk, we investigate dimensionality reduction and sparse representations to accomplish this task. More specifically, the application of nonlinear dimensionality reduction techniques and sparse coding are investigated in three hierarchical image layers, namely low-level features, mid-level structures and high-level attributes. For the low-level features, various techniques for dimensionality reduction, ranging from traditional image compression to the recently proposed Random Projections method, are explored. The application of these methods on computer vision algorithms, such as face detection and face recognition, are analyzed. In addition, a novel approach to super resolution is presented that is capable of increasing the spatial resolution of a single image based on the sparse representations framework. In the second part, mid-level structures, including image manifolds and sparse models, are utilized for face recognition and object tracking. A new method for robust object tracking under appearance and illumination variations is presented based on a combination of a template library, online distance metric learning, and the Random Projections transformation. In the third part, a novel framework for representing the semantic contents of images is investigated. This framework employs high level semantic attributes that aim to bridge the gap between the visual information of an image and its textual description by utilizing low level features and mid level structures. This innovative paradigm offers revolutionary possibilities including recognizing the category of an object from purely textual information without providing any explicit visual example.
Grigorios Tsagkatakis received his B.S. and M.S. degrees in electronics and computer engineering from the Technical University of Crete, Greece in 2005 and 2007 respectively. In 2011, he completed his Ph.D. in imaging science at the Center for Imaging Science, Rochester Institute of Technology, USA. He is currently working at a postdoctoral researcher in the Institute of Computer Science, Foundation of Research and Technology, Greece. He was a teaching and research assistant with the Department of Electronics and Computer Engineering and has worked on various European funded projects from 2003 to 2007. His main research interests include image processing, computer vision and machine learning. He has worked as a teaching assistant with the department of Computer Engineering at RIT and as a research assistant with at RIT's Real Time Computer Vision Lab working on human computer interaction and computer vision for smartphones. He was awarded the best paper award in Western New York Image Processing Workshop in 2010 for his paper "A Framework for Object Class Recognition with No Visual Examples."