Patent · US Expired

Clustering strings using N-grams

US7644076B1 · kind B1 · utility

30Cited by
17References
7Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 12, 2003
Grant dateJan 5, 2010
Priority date
Expiry dateSep 12, 2023

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/99936
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method and computer program for clustering a string are described. The string includes a plurality of characters. R unique n-grams T1 . . . R are identified in the string. For every unique n-gram TS, if the frequency of TS in a set of n-gram statistics is not greater than a first threshold, the string is associated with a cluster associated with TS. Otherwise, for every other n-gram TV in the string T1 . . . R, except S, if the frequency of n-gram TV is greater than the first threshold, and if the frequency of n-gram pair TS-TV is not greater than a second threshold, the string is associated with a cluster associated with the n-gram pair TS-TV. Otherwise, for every other n-gram TX in the string T1 . . . R, except S and V, the string is associated with a cluster associated with the n-gram triple TS-TV-TX. Otherwise, nothing is done.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.