Patent · US Expired

Multi-language document search and retrieval system

US6466901B1 · kind B1 · utility

52Cited by
9References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 30, 1998
Grant dateOct 15, 2002
Priority date
Expiry dateNov 30, 2018

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/289
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A multi-lingual indexing and search system performs tokenization and stemming in a manner which is independent of whether index entries and search terms appear as words in a dictionary. During the tokenization phase of the process, a string of text is separated into individual word tokens, and predetermined types of tokens are eliminated from further processing. The stemming phase of the process reduces words to grammatical stems by removing known word-endings associated with the various languages to be supported. Known word endings are removed from the word tokens without any effort to guarantee that the remaining stem is contained in a dictionary. In a preferred implementation, the stemming process is only applied to nouns.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.