Patent · US Active

Optimizing the performance of duplicate identification by content

US7617195B2 · kind B2 · utility

13Cited by

16References

23Claims

0Family size

Assignee

Xerox Corporation · US

Inventors

Tao Liang · Tangxia, CN
Xianing Zhu · Cupertino, US
Francois Ragnet · Venon, FR
Michel Gastaldo · Meylan, FR
Nicolas Monet · Seongnam-si, KR

Key dates

Filing date	Mar 28, 2007
Grant date	Nov 10, 2009
Priority date	—
Expiry date	Apr 8, 2028

Classification

Technology area (CPC Y)Emerging Cross-Sectional Technologies
CPC primaryY10S707/99936
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

In accordance with the disclosure, there is provided a method for identifying duplicate documents comprising drafting a first document and creating a near unique representative string based on the document content. The method further comprises searching for other documents with the same NRS and selectively assigning a duplicate group identification to the first document, the duplicate group identification is unique if no near unique representative string matches are found, or the duplicate group identification is the same as an associated duplicate document's duplicate group identification that matches the NRS. The method further comprises placing the DGI into a meta-data of the first document and recalling a list of duplicates of a particular document based upon user demand by searching the meta-data and selecting documents using the same DGI.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.