The human genome harbors ~7000 ohnolog genes, or duplicates that derive from whole-genome duplication (WGD) events at the origin of vertebrate. They are often associated with human diseases, and it is therefore important to make a comprehensive catalog of ohnologs. High-confidence identification of ohnologs hinges on synteny analysis and inference of pre- and post-WGD ancestral genome structures, but the ancient timing of teleost and vertebrate WGD events impedes high accuracy inference. Because of this difficulty, previous studies excluded a large part of the human genome with ambiguous synteny, resulting in low-coverage reconstructions
With the aim of explicitly dealing with reconstruction uncertainty, we developed a probabilistic model of macrosynteny conservation and devised variational Bayes algorithms for inferring the structure of pre-WGD genomes. We obtained high-coverage reconstructions of the ancestral vertebrate and teleost genomes by applying the method to the human, mouse, chicken, spotted gar, zebrafish, stickleback, Tetraodon, medaka, and amphioxus genomes. The results show that previously excluded regions in the modern vertebrate genomes tend to be comprised of multiple smaller synteny blocks with varying degrees of reconstruction probability, which represents reconstruction uncertainty due to incomplete genome assembly, intensive local rearrangements, etc. Our reconstructions provide an improved picture of early vertebrate genome evolution, showing how ancestral vertebrate chromosomes are retained in the modern genomes, how inter-chromosomal rearrangements occurred in individual vertebrate lineages, and how specific regions in the human genome originated by the vertebrate WGDs.