Disk-image deduplication with hash subset in memory
US10552075B2 · kind B2 · utility
Assignee
Inventor
Key dates
| Filing date | Jan 23, 2018 |
| Grant date | Feb 4, 2020 |
| Priority date | — |
| Expiry date | Feb 5, 2038 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F9/45533
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Deduplication of virtual-machine disk images and other disk images can involve identifying the first clusters in a file. The clusters are hashed. The first-in-file hashes (generated from first-in-file clusters) are stored in an in-memory index, while the full set of hashes is streamed in order to find matches with the hashes stored in the in-memory index. First-in-file hashes in the stream are compared, while other hashes in the stream are compared only if the immediately preceding hash resulted in a match. Comparing non-first-in-file hashes requires disk accesses, but since such comparisons are conditioned on first-in-file matches, there are relatively likely to result in sequences of matches. The net effect is a relatively fast deduplication with compression approaching that resulting from a full comparison of all hashes.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.