Patent · US Active

System and method for preprocessing a data set to improve deduplication

US8285957B1 · kind B1 · utility

41Cited by
1References
52Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJan 12, 2010
Grant dateOct 9, 2012
Priority date
Expiry dateOct 20, 2030

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F11/1453
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

The technique introduced here includes a system and method for preprocessing a data set to improve deduplication, and more specifically for reducing latency. The technique illustratively utilizes one or more preprocessing steps, including a “skipping” step and a “folding” step, which can be applied to a data set prior to deduplication to reduce the time consumed by deduplication. The folding step is applied to segments of the data set to reduce the length of the segments. The skipping step can be applied to the data set prior to the folding step to remove particular segments of the data set, to further improve deduplication performance in certain circumstances. The overall effect of the skipping and folding steps of this technique is to produce a data set of reduced total length for consideration in identifying duplicate data, which aids in reducing the time required for deduplication.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.