Optimized deduplication based on backup frequency in a distributed data storage system
US11513708B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 20, 2021 |
| Grant date | Nov 29, 2022 |
| Priority date | — |
| Expiry date | Jan 20, 2041 |
Classification
- Technology area (CPC H)Electricity
- CPC primaryH04L67/1097
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Disclosed deduplication techniques at a distributed data storage system guarantee that space reclamation will not affect deduplicated data integrity even without perfect synchronization between components. By understanding certain “behavioral” characteristics and schedule cadences of backup operations that generate backup copies received at the distributed data storage system, data blocks that are not re-written by subsequent backup copies are pro-actively aged, while promoting continued retention of data blocks that are re-written. An expiry scheme operates with block-level granularity. Each unique deduplicated data block is given an expiry timeframe based on the block's arrival time at the distributed data storage system (i.e., when a backup copy supplies the block) and further based on backup frequencies of the various virtual disks referencing a unique system-wide identifier of the block, which is based on the block's hash value. Communications between components are kept to an as-needed basis. Cloud-based and multi-cloud configurations are disclosed.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.