With scientific computations, e.g. in the field of bioinformatics, we observe a dramatic increase in the amount of data that is computed. Even with modern high performance computers the storage capabilities often form the bottleneck in computing more detailed results. We focus our research on parallel file systems that can be found in cluster environments.
The critical issue here is the overall performance that you will get from your system. Scalability is a problem because of sequential parts in the code of the parallel file system when it comes to metadata operations. What is needed are tools that give insight into the internal behavior of parallel file systems and relate this information to the user level. By doing so we can see what activity in the user program triggers which low level read/write operations.
The talk will present an enhanced tool environment that is based on PVFS2 and MPICH2. We add tracing facilities to the parallel file system and thus can investigate its behavior and relate it to the parallel user program.
The second part of the talk will present results from a test on a cluster at the German Cancer Research Center, where we investigate different parallel file systems for image processing. It is interesting to see which aspects have to be considered when it comes to tens of millions of single files. Even simple ls-operations will need a long time to complete and with parallel file systems things can even get worse.