Data deduplication dictionary system
US8250325B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Apr 1, 2010 |
| Grant date | Aug 21, 2012 |
| Priority date | — |
| Expiry date | Jan 21, 2031 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F11/1453
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A data deduplication method using a small hash digest dictionary in fast-access memory. The method includes receiving customer data, dividing the data into smaller chunks, and assigning hash values to each chunk. For each chunk, the method includes performing lookup for a duplicate chunk by accessing a small dictionary in memory with the chunk's hash value. When no entry, the small dictionary is updated to include the hash value to fill the dictionary with earliest received data. When an entry is found, the entry's hash value is compared with lookup value and if matched, reference data is returned and an entry counter is incremented. If not matched, additional accesses are attempted such as with additional indexes calculated using the hash value. Collisions may trigger an entry replacement such that some initially entered entries are replaced when determined to not be most repeating values such as based on their counter value.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.