K-mer based genomic reference data compression
US11515011B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Aug 9, 2019 |
| Grant date | Nov 29, 2022 |
| Priority date | — |
| Expiry date | Apr 30, 2041 |
Classification
- Technology area (CPC H)Electricity
- CPC primaryH03M7/3088
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A computer-implemented method includes receiving genomic data associated with a plurality of genomes and identifying k-mer sets within the genomic data. The method includes constructing a k-mer subset tree according to the following process: performing iterative pairwise comparisons on the k-mer sets, wherein the iterative pairwise comparisons identify fragments with the most shared k-mers, merging the identified fragments into non-leaf nodes of the k-mer subset tree, and placing each remaining k-mer into a leaf node of the k-mer subset tree. The method includes storing the k-mer subset tree. A computer program product for data compression includes a computer readable storage medium having program instructions embodied therewith. The program instructions are executable by a computer to cause the compute to perform the foregoing method. A system includes a processor and logic. The logic is configured to perform the foregoing method.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.