Emerging applications in Sensor and Peer-to-Peer networks make the concept of data integration without centralization nowadays more meaningful than ever. In these environments, data is generated continuously and potentially automatically across geographically diverse locations. Organizing data in centralized repositories is becoming prohibitively expensive and in many occasions impractical. Storing data in-situ however, complicates query processing because data relations are fragmented over a number of remote sites. Furthermore, accessing these fragmented relations is only feasible by traversing a network of other nodes. This makes the execution of a query an even more complex task. We claim that in many occasions it might more beneficial to find the K highest ranked (or Top-K) answers, for some user defined parameter K, if this can minimize the query execution cost.
In this talk, I will present techniques to efficiently answer Top-K queries in a distributed environment. A Top-K query returns the K highest ranked answers to a user defined similarity function. At the same time it also minimizes some cost metric, such as the utilization of the communication medium, which is associated with the retrieval of the desired answer set. I will provide an overview of state-of-the-art algorithms that solve the Top-K problem in a centralized setting and show why these are not applicable to the distributed case. I will then focus on the Threshold Join Algorithm (TJA), which is a novel solution for executing Top-K queries in a distributed environment. I will also present results from our performance study with a real middleware testbed deployed over a network of 75 workstations.
Demetrios Zeinalipour-Yazti is currently a Visiting Lecturer in Computer Science at the University of Cyprus. He holds a Ph.D. (2005) and M.Sc. (2003) in Computer Science and Engineering from the University of California - Riverside and a B.Sc. (2000) in Computer Science from the University of Cyprus. He has been a visiting researcher at the network intelligence lab of Akamai Technologies in 2004. His research interests include Database Management Systems, Network Data Management, Distributed Query Processing, Storage and Retrieval Methods for Peer-to-Peer and Sensor Networks.