A graph-based approach to diploid genome assembly
#MMPMID29949989
Garg S
; Rautiainen M
; Novak AM
; Garrison E
; Durbin R
; Marschall T
Bioinformatics
2018[Jul]; 34
(13
): i105-i114
PMID29949989
show ga
MOTIVATION: Constructing high-quality haplotype-resolved de novo assemblies of
diploid genomes is important for revealing the full extent of structural
variation and its role in health and disease. Current assembly approaches often
collapse the two sequences into one haploid consensus sequence and, therefore,
fail to capture the diploid nature of the organism under study. Thus, building an
assembler capable of producing accurate and complete diploid assemblies, while
being resource-efficient with respect to sequencing costs, is a key challenge to
be addressed by the bioinformatics community. RESULTS: We present a novel
graph-based approach to diploid assembly, which combines accurate Illumina data
and long-read Pacific Biosciences (PacBio) data. We demonstrate the effectiveness
of our method on a pseudo-diploid yeast genome and show that we require as little
as 50× coverage Illumina data and 10× PacBio data to generate accurate and
complete assemblies. Additionally, we show that our approach has the ability to
detect and phase structural variants. AVAILABILITY AND IMPLEMENTATION:
https://github.com/whatshap/whatshap. SUPPLEMENTARY INFORMATION: Supplementary
data are available at Bioinformatics online.