Undergraduate Thesis Projects
Department of Computer Science
University of Crete
Semester: Spring 2007
Supervisor: Prof. Angelos Bilas

Performance measurements of a clustered, high-end storage system.

Previous research has resulted in building a working prototype of a clustered storage system that is (will be) able to scale to large amounts of storage (in the order of hundreds of disks and TBytes of capacity). The purpose of this project is to perform the initial evaluation of a subset of such a system with micro-benchmarks and realistic applications. The work will mostly involve understanding of performance issues in systems software (storage I/O stack in the Linux kernel) on a real system.

Using the resources of a graphics adapter for high-performance computing.

Graphics cards are very good at certrain types of computation. What types of more "generic" operation can be performed efficiently using a graphics card? This project will examine certain types of applications and/or operations (e.g. encryption/decryption operations) and how they can be performed on graphics cards more efficiently than general purpose CPUs.

Performance evaluation of Linux I/O schedulers.

Linux kernel 2.6 comes with four different flavors for the disk I/O scheduler: no-op (traditional 'elevator' from 2.4 kernel), anticipatory, deadline-based, and completely-fair queueing (CFQ). Which one is the best and for what workloads? Also, how does the selection of a filesystem interract with the scheduler (remember a filesystem changes the disk data access pattern generated by the application due to metadata accesses generated by the filesystem itself). How does ext3, ReiserFS, xfs interract with the schedulers? How about parallel filesystems?

Tune and examine the overheads of an existing shared virtual memory system on top of a 10 Gbit/s interconnect.

Shared virtual memory is a technique that allows a cluster of systems that do not share memory to behave a single, multiprocessor system that can execute transparently multi-threaded applications. Such system usually rely on high-speed interconnects to achieve good performance. The goal of this work is to port an existing, heavily optimized shared virtual memory system that runs on top of 1 GBit/s Ethernet interfaces to 10 GBit/s network interfaces (Ethernet or otherwise).

Porting and tuning of kernel-level remote storage access protocol over 10 GBit/s network interfaces.

Previous work has resulted in a kernel-level protocol that performs remote storage I/O on top of high-speed interconnects. This protocol is currently implemented on top of custom-designed network cards. The purpose of this work is to port this protocol on top of commercially available, 10 GBit/s network cards that have a programmable processor. The work involves porting of user- and kernel-level code, as well as the potential for executing part of the code in the network interface it self for further improving system performance.

Examine the implications of removing interrupts from the receive path of disk and network I/O stack in the kernel.

In high speed I/O (network or storage) a major performance problem (especially for speeds greater than 10GBits/s) is the cost of interrupts. The goal of this work is to examine how systems software may be restructured on systems that have multiple CPUs to replace interrupts with polling (or hybrid) techniques to eliminate the cost associated with interrupt processing.

Create a repository of existing applications for multi-core processors

A main trend in processor architecture is the design and implementation of multi-core cpus that share as few hardware structures as possible (for achieving good scalability in future designs). Evaluating a design in this area requires running actual applications. The purpose of this project is to collect a set of applications that may be used for performance analysis, port them on a multi-core CPU, and provide a means for running them off-line in an automated manner.

Improving the robustness of the communication protocol in a real sensor network.

Recently, it has become possible to build sensors that besides sensing they also have processing and communication capabilities. Such systems, are usually equipped with a small CPU, little memory, and a short-range, low-speed wireless interface. Previous work in this area has resulted in building an operating system that allows programmer to write and execute programs on a network of such devices. In this system, the communication protocols is one of the most significant components, as it is responsible for all interactions of each sensors with the rest of the world. The goal of this work is to examine in more detail the characteristics of the existing communication protocol and the factors that affect its robustness and to propose mechanisms that will improve communication efficiency in a real (noisy) environment.

Development of a block-device access tracer for the Linux kernel

An interesting problem in analyzing the performance of storage systmes is the ability to collect traces of block access patterns during the execution of certain I/O-intensive workloads (eg: TPC-C using MySQL, on top of reiserfs). This project involves developing a tracer kernel module that "appears" to be a block device, but actually relays read/write requests to another block device (eg: bindings like /dev/tracer0 --> /dev/sda). The tracer module uses a dedicated disk, or partition, to store (in 'raw' binary format) trace records. The tracer module should be controlled & monitored via ioctl() and/or via /proc entries (eg: signal to start/stop tracing, accumulative count of read/write accesses). To retrieve the records for later processing, we could a user-space tool.

Prototype of a content-addressable storage system

Design and implement a "virtual", content-addressable block device. Block devices traditionally allow read/write addresses based on block addresses. However, in a content-addressable device, the write operation returns a (content) "key" or "tag" to the user and the user is able to retrieve blocks using this key with the read operation. Thus, such devices, do not have duplicate blocks, they provide strong support for archival purposes, however, may be hard to use when updates to existing information is required. This project will build such a device and will also provide a simple file system that allows users to store and retrieve regular files.

Install, evaluate, and tune an existing distributed file system (GFS) on top of a mid-size clustered storage system.

Building large scale storage systems today requires usually using a distributed file system. Although this approach introduces very high overheads and scalability limitations, it is the only realistic approach at this point. Current research aims at addressing these limitations. The goal of this work is to examine the overheads associated with using distributed file systems. This will happen by examining the performance of an actual distributed file system (most probably GFS) on a real system with ~100 disks using micro-benchmarks and real applications.

Design and evaluation of storage compression and duplicate elimination techniques at the block-level.

With increasing needs for storage, saving space becomes an increasingly important problem for many applications. Doing so transparently and without application or file system modifications may result in reducing the cost significantly in many application domains. This project will examine techniques for eliminating duplicate blocks as well as compressing them on top of a block-level storage system that supports only fixed block sizes. The work will occur mostly in the Linux kernel and will use a custom framework (developed locally) that facilitates kernel-level development of storage modules.