Hash-based duplicate data element systems and methods
US11789916B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 14, 2021 |
| Grant date | Oct 17, 2023 |
| Priority date | — |
| Expiry date | Jan 21, 2042 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/93
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for reducing a storage of duplicated documents is provided. Methods may include hashing each document stored in the centralized data repository by executing a hashing algorithm on the document, outputting a hash-value and adding the hash-value and a hash pointer to a hash table. Methods may further include crawling the hash table to identify duplicate hash-values. For each hash-value recorded on the hash table two or more times, methods may include combining two or more duplicate hash-values into a cluster and for each cluster identifying, on the hash table, a unique hash-value. For the unique hash-value, methods may include maintaining the unique hash-value on the hash table and maintaining the document corresponding to the unique hash-value in the memory address. For each remaining duplicate hash-value stored in the cluster, deleting the corresponding document from the memory address and store the reference pointer at the memory address.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.