Methods for optimized variable-size deduplication using two stage content-defined chunking and devices thereof
US10866928B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 14, 2019 |
| Grant date | Dec 15, 2020 |
| Priority date | — |
| Expiry date | Jul 28, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/152
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods, non-transitory machine readable media, and computing devices that compare a hash value to a predefined value for sliding windows in parallel for segments partitioned from an input data stream. A bit array is parsed according to minimum and maximum chunk sizes to identify chunk boundaries for the input data stream. The bit array is populated based on a result of the comparison and portions of the bit array are parsed in parallel. Unique chunks of the input data stream defined by the chunk boundaries are stored in a storage device. Accordingly, this technology utilizes parallel processing in two stages. In a first stage, rolling window based hashing is performed concurrently to identify potential chunk boundaries. In a second stage, actual chunk boundaries are selected based on minimum and maximum chunk size constraints. This technology advantageously facilitates significant deduplication ratio improvement as well as improved parallel chunking performance.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.