Patent · US Active

Disk-image deduplication with hash subset in memory

US10552075B2 · kind B2 · utility

0Cited by

3References

20Claims

0Family size

Assignee

VMware LLC · US

Inventor

Oleg Zaydman · San Jose, US

Key dates

Filing date	Jan 23, 2018
Grant date	Feb 4, 2020
Priority date	—
Expiry date	Feb 5, 2038

Classification

Technology area (CPC G)Physics
CPC primaryG06F9/45533
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Deduplication of virtual-machine disk images and other disk images can involve identifying the first clusters in a file. The clusters are hashed. The first-in-file hashes (generated from first-in-file clusters) are stored in an in-memory index, while the full set of hashes is streamed in order to find matches with the hashes stored in the in-memory index. First-in-file hashes in the stream are compared, while other hashes in the stream are compared only if the immediately preceding hash resulted in a match. Comparing non-first-in-file hashes requires disk accesses, but since such comparisons are conditioned on first-in-file matches, there are relatively likely to result in sequences of matches. The net effect is a relatively fast deduplication with compression approaching that resulting from a full comparison of all hashes.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.