The identification of genomic rearrangements with high sensitivity and specificity remains a major challenge. Whilst new sequencing technologies enable the detection of a wider range of events, there is still scope for significant improvement of short-read based approaches. Here, we present the Genome Rearrangement IDentification Software Suite (GRIDSS). GRIDSS is composed of an assembler that performs alignment-constrained whole genome breakend assembly using a novel positional de Bruijn graph algorithm and a probabilistic structural variant caller that combines assembly, split read, and read pair evidence in a unified variant scoring model.
Our novel assembly approach identifies breakend contigs, that is, contigs assembled from a single side of a breakpoint. By incorporating positional information into the assembly graph GRIDSS performs whole genome assembly of all contigs, with no separation into windows or prior identification of candidate regions required. Although this approach results in an assembly graph that is approximately 100 times larger than the equivalent whole genome de novo de Bruijn graph, our method assembles a 50x WGS data set in less than 4 CPU hours using 16GB of memory.
GRIDSS achieves high sensitivity and specificity on cell line and patient tumour datasets. Results on well-characterised Genome in a Bottle data demonstrate improved sensitivity whilst retaining a false discovery rate less than half that of other recent methods. GRIDSS detects micro-homologies and non-templated sequence insertions at the breakpoint, and can perform combined variant discovery on multiple related samples and population data. GRIDSS is freely available at https://github.com/PapenfussLab/gridss.