Patent · US Expired

Statistical thesaurus, method of forming same, and use thereof in query expansion in automated text searching

US5926811A · kind A · utility

150Cited by

12References

19Claims

0Family size

Assignee

LexisNexis · US

Inventors

David J. Miller · Portland, US
Xin Lu · Tokyo, JP
John Holt · Centerville, US

Key dates

Filing date	Mar 15, 1996
Grant date	Jul 20, 1999
Priority date	—
Expiry date	Mar 15, 2016

Classification

Technology area (CPC Y)Emerging Cross-Sectional Technologies
CPC primaryY10S707/99937
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A statistical thesaurus is built dynamically, from the same text collection that is being searched, allowing improved generation of expanded query terms. The thesaurus is dynamic in that thesaurus records are collected, ranked, accessed, and applied dynamically. Thesaurus "records" are actually formed as indexed documents arranged in "collections". The collections are preferably distinguished based on text source (court cases versus news wires versus patents, and so forth). Each record has terms assembled in indexed groups (or segments) which inherently reflect a ranking based on relevance to an initial query. After an initial query is received, the appropriate collection(s) of records may be searched by a conventional search and retrieval engine, the searches inherently returning records ranked by degree of relevance due to the record indexing scheme. A record ranking scheme avoids contamination of relevant records by less relevant records. The record selection and the expansion query term generation processes are each divided into parallel threads. The separate threads correspond to respective text sources to enable the improved expansion query term generation to be provided in r…

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.