Method and system for performing phrase/word clustering and cluster merging
US7519590B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 9, 2003 |
| Grant date | Apr 14, 2009 |
| Priority date | — |
| Expiry date | Jun 8, 2024 |
Classification
- Technology area (CPC Y)Emerging Cross-Sectional Technologies
- CPC primaryY10S707/99938
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Text classification has become an important aspect of information technology. Present text classification techniques range from simple text matching to more complex clustering methods. Clustering describes a process of discovering structure in a collection of characters. The invention automatically analyzes a text string and either updates an existing cluster or creates a new cluster. To that end, the invention may use a character n-gram matching process in addition to other heuristic-based clustering techniques. In the character n-gram matching process, each text string is first normalized using several heuristics. It is then divided into a set of overlapping character n-grams, where n is the number of adjacent characters. If the commonality between the text string and the existing cluster members satisfies a pre-defined threshold, the text string is added to the cluster. If, on the other hand, the commonality does not satisfy the pre-defined threshold, a new cluster may be created. Each cluster may have a selected topic name. The topic name allows whole clusters to be compared in a similar way to the individual clusters, and merged when a predetermined level of commonality exists…
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.