Patent · US Expired

Method and apparatus for characterizing documents based on clusters of related words

US7383258B2 · kind B2 · utility

109Cited by
9References
60Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 30, 2003
Grant dateJun 3, 2008
Priority date
Expiry dateJul 26, 2025

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/99943
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

One embodiment of the present invention provides a system characterizes a document with respect to clusters of conceptually related words. Upon receiving a document containing a set of words, the system selects “candidate clusters” of conceptually related words that are related to the set of words. These candidate clusters are selected using a model that explains how sets of words are generated from clusters of conceptually related words. Next, the system constructs a set of components to characterize the document, wherein the set of components includes components for candidate clusters. Each component in the set of components indicates a degree to which a corresponding candidate cluster is related to the set of words.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.