System and method for preprocessing a data set to improve deduplication
US8285957B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 12, 2010 |
| Grant date | Oct 9, 2012 |
| Priority date | — |
| Expiry date | Oct 20, 2030 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F11/1453
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
The technique introduced here includes a system and method for preprocessing a data set to improve deduplication, and more specifically for reducing latency. The technique illustratively utilizes one or more preprocessing steps, including a “skipping” step and a “folding” step, which can be applied to a data set prior to deduplication to reduce the time consumed by deduplication. The folding step is applied to segments of the data set to reduce the length of the segments. The skipping step can be applied to the data set prior to the folding step to remove particular segments of the data set, to further improve deduplication performance in certain circumstances. The overall effect of the skipping and folding steps of this technique is to produce a data set of reduced total length for consideration in identifying duplicate data, which aids in reducing the time required for deduplication.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.