Patent · US Active

Selection of digest hash function for different data sets

US11308036B2 · kind B2 · utility

7Cited by
3References
22Claims
0Family size

Assignee

Inventors

Key dates

Filing dateApr 11, 2019
Grant dateApr 19, 2022
Priority date
Expiry dateFeb 28, 2040

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F3/0641
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Techniques for processing data may include: receiving a plurality of data chunks for a data set; performing data deduplication processing for the plurality of data chunks; determining, in accordance with one or more criteria, whether a frequency distribution of a frequency histogram of digest byte frequencies is sufficiently uniform; and responsive to determining that the frequency distribution of the frequency histogram is not sufficiently uniform, performing processing to update data deduplication settings for the data set. Updating the data deduplication settings may include using a stronger hash algorithm and/or a larger size digest when generating subsequent digests. The data deduplication processing may include: determining, using a current hash algorithm, a plurality of digests for the plurality of data chunks of the data set; and updating the frequency histogram of digest byte frequencies for the data set in accordance the plurality of digests.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.