Selection of digest hash function for different data sets
US11308036B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Apr 11, 2019 |
| Grant date | Apr 19, 2022 |
| Priority date | — |
| Expiry date | Feb 28, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F3/0641
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Techniques for processing data may include: receiving a plurality of data chunks for a data set; performing data deduplication processing for the plurality of data chunks; determining, in accordance with one or more criteria, whether a frequency distribution of a frequency histogram of digest byte frequencies is sufficiently uniform; and responsive to determining that the frequency distribution of the frequency histogram is not sufficiently uniform, performing processing to update data deduplication settings for the data set. Updating the data deduplication settings may include using a stronger hash algorithm and/or a larger size digest when generating subsequent digests. The data deduplication processing may include: determining, using a current hash algorithm, a plurality of digests for the plurality of data chunks of the data set; and updating the frequency histogram of digest byte frequencies for the data set in accordance the plurality of digests.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.