Patent · US Active

Hash-based duplicate data element systems and methods

US11789916B2 · kind B2 · utility

0Cited by
0References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateDec 14, 2021
Grant dateOct 17, 2023
Priority date
Expiry dateJan 21, 2042

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/93
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method for reducing a storage of duplicated documents is provided. Methods may include hashing each document stored in the centralized data repository by executing a hashing algorithm on the document, outputting a hash-value and adding the hash-value and a hash pointer to a hash table. Methods may further include crawling the hash table to identify duplicate hash-values. For each hash-value recorded on the hash table two or more times, methods may include combining two or more duplicate hash-values into a cluster and for each cluster identifying, on the hash table, a unique hash-value. For the unique hash-value, methods may include maintaining the unique hash-value on the hash table and maintaining the document corresponding to the unique hash-value in the memory address. For each remaining duplicate hash-value stored in the cluster, deleting the corresponding document from the memory address and store the reference pointer at the memory address.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.