We have seen numerous successes in genomewide association studies (GWAS) underlying complex traits over the past decade. However much of this work has only been performed in populations of European descent. To address this disparity, we developed the Multi-Ethnic Genotyping Array (MEGA), a single platform designed for balanced GWAS coverage across the globe incorporating a catalog of functional variation.
To maximize trans-ethnic utility we designed the GWAS backbone to be informed by whole genome sequences across 26 populations of the 1000 Genomes Project and be bolstered by tag SNPs from 642 high-coverage whole genomes from individuals of African descent in the CAAPA consortium. We developed a novel cross-population tag SNP selection strategy to capture low frequency variants across the diverse populations in Phase 3 of the 1000 Genomes Project (TGP). Importantly, by optimizing imputation accuracy rather than pairwise LD, the performance of the array is high across all continental TGP super-populations (>90% imputation accuracy for MAF >=1% ). We deconvolved admixture to evaluate per-ancestry imputation performance, and devised a whole genome sequencing panel to balance existing reference datasets. A reference panel of several thousand individuals, including the Human Genome Diversity Panel and a large panel of indigenous Americans, will be available on MEGA to aid in rare variant calling, ancestry characterization, and admixture analyses.
Currently we have genotyped >50,000 African-American, Hispanic/Latino, Asian American and Native American and Hawaiian individuals from PAGE cohorts. From these diverse populations we can infer an extraordinary breadth of population structure, admixture, and differential relatedness with important implications for complex trait association studies within and across ethnicities. Here, we highlight the need for methods tothat can capture and model such high levels of diversity, both to optimize statistical power and improve biological interpretation.