Data compression using dictionaries
US11620263B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Aug 17, 2021 |
| Grant date | Apr 4, 2023 |
| Priority date | — |
| Expiry date | Aug 17, 2041 |
Classification
- Technology area (CPC H)Electricity
- CPC primaryH03M7/3077
- WIPO fieldBasic communication processes
- WIPO sectorElectrical engineering
Abstract
Data units of a dataset may be compressed by clustering the data units into clusters, selecting a reference unit for each unit cluster, and compressing data units of each unit cluster using the reference unit of the unit cluster as a dictionary. The computational efficiency of the clustering algorithm may be improved by not applying it to data units themselves, but rather to hash values of the data units, where the hash values have a much smaller size than the data units. The hash function may be a locality-sensitive hash (LSH) function. The reference unit of a cluster may be determined in any of a variety of ways, for example, by selecting a centroid or exemplar of the cluster. Clusters, including their references values, may be indexed in a cluster index (e.g., a Faiss index), which may be searched to assign future added or modified data units to clusters.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.