Distributed deduplication using global chunk data structure and epochs
US8930648B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | May 23, 2012 |
| Grant date | Jan 6, 2015 |
| Priority date | — |
| Expiry date | Feb 7, 2033 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F12/0292
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Techniques for a data storage cluster and a method for deduplicating data in the data storage cluster in a scalable manner, by (among other things) using an epoch-based global chunk data structure, are disclosed herein. A global chunk data structure for an epoch is distributed and maintained at a plurality of metadata nodes within the data storage cluster. Fingerprints and identifiers of data chunks are written to the cluster after a particular epoch are written to delta chunk data structures stored in different metadata nodes of the cluster. When the data storage cluster advances to the next epoch, the global chunk data structure is updated using the delta chunk data structures. At any given time, data deduplication in the data storage cluster can be conducted based on the global chunk data structure for the current epoch.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.