Patent · US Active

Use of similarity hash to route data for improved deduplication in a storage server cluster

US8607017B2 · kind B2 · utility

10Cited by
2References
29Claims
0Family size

Assignee

Inventor

Key dates

Filing dateSep 14, 2012
Grant dateDec 10, 2013
Priority date
Expiry dateSep 14, 2032

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F2206/1012
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A technique for routing data for deduplication in a storage server cluster includes computing, for each node in the cluster, a value collectively representative of the data stored on the node, such as a “geometric center” of the node. New or modified data is routed to the node which has stored data identical or most similar to the new or modified data, as determined based on those values. Each node stores a plurality of chunks of data, where each chunk includes multiple deduplication segments. A content hash is computed for each deduplication segment in each node, and a similarity hash is computed for each chunk from the content hashes of all segments in the chunk. A geometric center of a node is computed from the similarity hashes of the chunks stored in the node.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.