Patent · US Active

Identifying common file-segment sequences

US10430379B2 · kind B2 · utility

0Cited by

2References

8Claims

0Family size

Assignee

VMware LLC · US

Inventor

Oleg Zaydman · San Jose, US

Key dates

Filing date	Jun 5, 2017
Grant date	Oct 1, 2019
Priority date	—
Expiry date	Mar 31, 2038

Classification

Technology area (CPC G)Physics
CPC primaryG06F2009/45562
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Virtual-machine images (VMIs) can be compressed by identifying common cluster sequences shared across VMIs. To identify these sequences, hashes are generated for each cluster in each VMI, resulting in hash files for respective VMIs. The hashes are partitioned to address memory constraints. For each partition, its hashes are entered into buckets of a hash map according to their respective hash values. Each (non-empty) bucket associates a key hash value with one or more pointers to locations in the hash files. Clusters of hashes are fetched from the hash files referenced by multi-pointer buckets. The hash clusters are scanned across clusters to identify common hash sequences. Common cluster sequences are then identified based on the common hash sequences. This process works with files other than VMIs and with segment sizes other than clusters.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.