Quality score compression apparatus and method for improving downstream accuracy
US11762813B2 · kind B2 · utility
Inventors
Key dates
| Filing date | Feb 5, 2019 |
| Grant date | Sep 19, 2023 |
| Priority date | — |
| Expiry date | Dec 22, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG16C99/00
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
This disclosure provides for a highly-efficient and scalable compression tool that compresses quality scores, preferably by capitalizing on sequence redundancy. In one embodiment, compression is achieved by smoothing a large fraction of quality score values based on k-mer neighborhood of their corresponding positions in read sequences. The approach exploits the intuition that any divergent base in a k-mer likely corresponds to either a single-nucleotide polymorphism (SNP) or sequencing error; thus, a preferred approach is to only preserve quality scores for probable variant locations and compress quality scores of concordant bases, preferably by resetting them to a default value. By viewing individual read datasets through the lens of k-mer frequencies in a corpus of reads, the approach herein ensures that compression “lossiness” does not affect accuracy in a deleterious way.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.