In the first part of my talk, I will discuss the design, development and implementation of a novel algorithm, that of the Interpolated Variable Order Motifs, for the compositionalbased prediction of Genomic Islands (GIs). This algorithm exploits compositional biases using variable order motif distributions and captures more reliably the local composition of a sequence compared with fixed order methods, overcoming the limitations of the latter. Furthermore, for the optimal localization of the boundaries of each predicted region, the Hidden Markov Model theory is implemented in a change point detection framework, predicting more accurately the true insertion point of candidate GIs.
In the second part of my talk, whole genome based comparative techniques for studying the acquisition of horizontally acquired genes in the Salmonella lineage in a time dependent manner, will be discussed. The compositional amelioration process is modelled and the relative time of acquisition of those genes is determined on different branches of the S. enterica phylogenetic tree, applying a maximum parsimony algorithm.
The focus of the third part of my talk is the discussion of a methodology for explicitly quantifying and modelling the contribution of genomic features to the GI structure, under a probabilistic framework. A hypothesis free, bottom-up search is implemented and approximately 700 genomic regions are identified; these include both GIs and randomly sampled regions from three different genera that form my training dataset. A Machine Learning approach is used to exploit the above dataset and study the structural variation of GIs.
The last part of this talk discusses the experimental validation of the in silico predictions made on a newly sequenced bacterial genome. Applying a PCR-based protocol, the presence and absence of the predicted candidate islands are probed in seventeen unsequenced closely and distantly related strains. The true borders of the predicted islands are confirmed by sequencing across the boundary site in strains lacking the island.
References
Graduated from University of Athens, Faculty of Biology in 2003 having undertaken a 1-year research project in the Biophysics and Bioinformatics laboratory, under the supervision of Prof. Stavros Hamodrakas. In October 2004, joined for a PhD the Wellcome Trust Sanger Institute, registered as a graduate student at the University of Cambridge. Currently is a final year Wellcome Trust PhD candidate, working on computational methods for analyzing large-scale microbial genomic data.
Interests include comparative genomics, evolution, sequence and compositional analysis, clustering and machine learning.
Is the author of two publicly available, standalone softwares for genome annotation and visualization.