Poster Presentation Society for Molecular Biology and Evolution Conference 2016

Enrichme – a gene set enrichment tool that naturally corrects for gene length and clustering (#552)

Hannes Svardal 1 2 , Ümit Seren 2 , Magnus Nordborg 2
  1. Wellcome Trust Sanger Institute, Cambridge, United Kingdom
  2. Gregor Mendel Insitute, Austrian Academy of Sciences, Vienna, Austria

Gene set enrichment analysis is a vital tool to test the biological relevance of and interpret genomic data such as selection scans and genome-wide association studies. However, many of the current methods are biased by the fact that (i) longer genes are more likely to overlap high scores and (ii) genes with similar function are often clustered in the genome. The latter problem is often corrected for by removing variants in high linkage disequilibrium, but the cutoff for this is arbitrary. Here, we present a method that is not affected by either of these biases. Briefly, it determines significance by comparing scores attributed to genes and gene categories to an empirical null-distribution of scores obtained by randomly rotating the data against its genomic positions, hence, keeping LD-structure and gene-clustering intact. Our method is implemented in the free and open source tool enrichme. The implementation features a classical candidate enrichment mode, where the user can define a threshold on genomic input scores, to test whether genes in the vicinity of scores above the threshold are enriched in particular gene sets, such as gene ontology (GO) terms. Alternatively, rather than defining a hard cutoff for genes to test, the program can evaluate the significance of user defined summaries of scores across genes and across gene sets. As an example, one could test whether the mean across a gene set of average scores across each gene in the set are significantly elevated. This approach increases statistical power in cases where the relative magnitude of scores confers biological signal.