Patent · US Active

Disk-image deduplication with hash subset in memory

US10552075B2 · kind B2 · utility

0Cited by
3References
20Claims
0Family size

Assignee

Inventor

Key dates

Filing dateJan 23, 2018
Grant dateFeb 4, 2020
Priority date
Expiry dateFeb 5, 2038

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F9/45533
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Deduplication of virtual-machine disk images and other disk images can involve identifying the first clusters in a file. The clusters are hashed. The first-in-file hashes (generated from first-in-file clusters) are stored in an in-memory index, while the full set of hashes is streamed in order to find matches with the hashes stored in the in-memory index. First-in-file hashes in the stream are compared, while other hashes in the stream are compared only if the immediately preceding hash resulted in a match. Comparing non-first-in-file hashes requires disk accesses, but since such comparisons are conditioned on first-in-file matches, there are relatively likely to result in sequences of matches. The net effect is a relatively fast deduplication with compression approaching that resulting from a full comparison of all hashes.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.