Patent · US Expired

Method and system for performing phrase/word clustering and cluster merging

US6578032B1 · kind B1 · utility

88Cited by
8References
17Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 28, 2000
Grant dateJun 10, 2003
Priority date
Expiry dateApr 8, 2021

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/99938
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Text classification has become an important aspect of information technology. Present text classification techniques range from simple text matching to more complex clustering methods. Clustering describes a process of discovering structure in a collection of characters. The invention automatically analyzes a text string and either updates an existing cluster or creates a new cluster. To that end, the invention may use a character n-gram matching process in addition to other heuristic-based clustering techniques. In the character n-gram matching process, each text string is first normalized using several heuristics. It is then divided into a set of overlapping character n-grams, where n is the number of adjacent characters. If the commonality between the text string and the existing cluster members satisfies a pre-defined threshold, the text string is added to the cluster. If, on the other hand, the commonality does not satisfy the pre-defined threshold, a new cluster may be created. Each cluster may have a selected topic name. The topic name allows whole clusters to be compared in a similar way to the individual clusters, and merged when a predetermined level of commonality exists…

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.