System and method for eliminating duplicate data by generating data fingerprints using adaptive fixed-length windows
US8180740B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Aug 12, 2009 |
| Grant date | May 15, 2012 |
| Priority date | — |
| Expiry date | Jul 23, 2030 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F2201/83
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method and system for generating data fingerprints is used to de-duplicate a data set having a high level of redundancy. A fingerprint generator generates a data fingerprint based on a data window. Each byte of the data set is added to the fingerprint generator and used to detect an anchor within the received data. If no anchor is detected, the system continues receiving bytes until a predefined window size is reached. When the window size is reached, the system records a data fingerprint based on the data window and resets the window size. If an anchor is detected, the system extends the window size such that the window ends a specified length after the location of the anchor. If the extended window is greater than a maximum size, the system ignores the anchor. The generated fingerprints are compared to a fingerprint database. The data set is then de-duplicated by replacing matching data segments with references to corresponding stored data segments.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.