Comparative Analysis of Short-Read and Long-Read Sequencing and Bioinformatic Workflows for Variant Detection and Interpretation in Hereditary Breast Cancer
Abstract
Advances in DNA sequencing have transformed genomic research and clinical genetics. Since Sanger sequencing, technologies have advanced to high-throughput whole-genome platforms, enhancing disease gene discovery. Modern sequencing approaches, including short-read and long-read sequencing, differ in read length, accuracy, and throughput, influencing the types of variants that can be detected across genomic contexts. These differences are further influenced by bioinformatic workflows and affect variant interpretation and clinical actionability. Whole-genome sequencing was carried out on three African American women with early-onset, BRCA1/2-negative breast cancer using Illumina short-read and PacBio long-read platforms. Short variant detection was assessed using multiple alignment and variant-calling pipelines, while structural variant detection employed platform-specific approaches. Analyses included genome-wide comparisons of variant counts and concordance across platforms and pipelines, as well as variant-level assessments in cancer susceptibility genes, enabling annotation comparisons across ClinVar, InterVar, and AnnotSV. A separate genome-wide analysis of pathogenic variants was conducted, comparing results across sequencing platforms and annotation frameworks. Variant detection patterns were strongly influenced by sequencing platform, with additional effects from bioinformatic workflows. For short variants, concordance was high across Illumina pipelines, whereas PacBio showed greater variability by aligner and caller and identified more variants overall. Within cancer susceptibility genes, patterns mirrored genome-wide results, with additional differences in variant annotation: ClinVar classified more pathogenic short variants than InterVar, and PacBio identified more pathogenic variants and variants of uncertain significance than Illumina. For structural variants in cancer genes, PacBio detected more variants, particularly in complex genomic regions, with limited platform concordance and no clear pathogenic findings. Genome-wide analyses of pathogenic variants similarly showed more variants with PacBio, many of which were PacBio-specific. No clearly pathogenic variants indicative of increased inherited cancer risk were identified. Overall, these findings demonstrate that sequencing platforms, bioinformatic workflows, and annotation frameworks all influence variant detection and classification. While long-read sequencing improves variant detection, particularly in challenging genomic regions, increased sensitivity does not directly translate to clinical insight, as many variants remain difficult to interpret and lack clear associations with hereditary cancer risk. These results underscore the need to combine sequencing strategies, optimize workflows, and improve variant annotation for clinical interpretation.
