Patent · US Active

K-mer based genomic reference data compression

US11515011B2 · kind B2 · utility

0Cited by
1References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateAug 9, 2019
Grant dateNov 29, 2022
Priority date
Expiry dateApr 30, 2041

Classification

  • Technology area (CPC H)Electricity
  • CPC primaryH03M7/3088
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A computer-implemented method includes receiving genomic data associated with a plurality of genomes and identifying k-mer sets within the genomic data. The method includes constructing a k-mer subset tree according to the following process: performing iterative pairwise comparisons on the k-mer sets, wherein the iterative pairwise comparisons identify fragments with the most shared k-mers, merging the identified fragments into non-leaf nodes of the k-mer subset tree, and placing each remaining k-mer into a leaf node of the k-mer subset tree. The method includes storing the k-mer subset tree. A computer program product for data compression includes a computer readable storage medium having program instructions embodied therewith. The program instructions are executable by a computer to cause the compute to perform the foregoing method. A system includes a processor and logic. The logic is configured to perform the foregoing method.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.