Patent · US Active

Distributed deduplication using global chunk data structure and epochs

US8930648B1 · kind B1 · utility

47Cited by
1References
28Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMay 23, 2012
Grant dateJan 6, 2015
Priority date
Expiry dateFeb 7, 2033

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F12/0292
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Techniques for a data storage cluster and a method for deduplicating data in the data storage cluster in a scalable manner, by (among other things) using an epoch-based global chunk data structure, are disclosed herein. A global chunk data structure for an epoch is distributed and maintained at a plurality of metadata nodes within the data storage cluster. Fingerprints and identifiers of data chunks are written to the cluster after a particular epoch are written to delta chunk data structures stored in different metadata nodes of the cluster. When the data storage cluster advances to the next epoch, the global chunk data structure is updated using the delta chunk data structures. At any given time, data deduplication in the data storage cluster can be conducted based on the global chunk data structure for the current epoch.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.