Patent · US Active

Progressive sampling for deduplication indexing

US8311964B1 · kind B1 · utility

48Cited by
26References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 12, 2009
Grant dateNov 13, 2012
Priority date
Expiry dateMay 12, 2031

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F3/067
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A system and method for efficiently reducing a number of duplicate blocks of stored data. A file server both removes duplicate data and prevents duplicate data from being stored in the shared storage. A sampling rate may be used to determine which fingerprints, or hash values, are stored in an index. The sampling rate may be modified in response to changes in characteristics of the system, such as a change in the shared storage size, a change in a utilization of the shared storage, a change in the size of the storage unit, and reaching a threshold corresponding to utilization of the index. Also, a small cache may be maintained for holding fingerprint and pointer pair values prefetched from the shared storage. Each prefetched pair may be associated with data corresponding to a previous hit in the index. The association may be related to spatial locality, temporal locality, or otherwise.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.