Altogether 2244 whole genomes of geographically diverse individuals from Estonia were sequenced to a median depth of 30x using Illumina HiSeq with TruSeq PCR-free library preparation method. We found 19M SNVs and 6.6M indel variants with allele count larger than two and of which 8.4M were novel. Within this study we have analysed both loss-of-function variants revealed as well as the population structure of Estonia.
We found a total of 14,531 autosomal loss-of-function (LOF) SNVs and indels in 6,991 genes. Out of these genes, 3.3% contained homozygous or compound heterozygotes LOF variants with minor allele frequency less than 2%. By combining the data of complete gene knockouts of individuals with their disease history and variety of available endophenotypes (proteomics, NMR, biochemistry) will help us to study the function of these genes and will lead to better understanding the phenotypic consequences of the variation within these genes.
To study the fine-scale genetic structure of the Estonian population, we concentrate on a subset of the genomes (N=436), which comprehensively cover rural Estonia to minimize the mixing effect of historical urbanization. We further combine these genomes with a pan Eurasian panel of high coverage genomes from hundreds of populations. Using haplotype and allele frequency based methods we show that the genetic structure within Estonia is largely in line with the division of inland vs. maritime Estonia what has been proposed based on archaeological findings. Furthermore, we identify and quantify the relative contributions of the three major genetic domains of the European gene pool in Estonians and estimate split times from linguistically and geographically adjacent populations. We use Finestructure and inter-population doubleton distribution to reveal patterns of genetic sharing between Estonians and other European populations and infer population history in a series of population splits and admixture events in pre-historic and historic times.