Document search method wherein stored documents and search queries comprise segmented text data of spaced, nonconsecutive text elements and words segmented by predetermined symbols
US5748953A · kind A · utility
Assignee
Inventors
Key dates
| Filing date | May 18, 1995 |
| Grant date | May 5, 1998 |
| Priority date | — |
| Expiry date | May 18, 2015 |
Classification
- Technology area (CPC Y)Emerging Cross-Sectional Technologies
- CPC primaryY10S707/99936
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A neighboring plural-character occurrence bitmap of a practical capacity capable of eliminating noises by hashing is realized, and a high speed full text search is realized equivalently, by greatly reducing the number of documents to be searched even if a search term constituted by a combination of English characters and words is used. Text data is segmented into words, and n-character strings at every (m+l)-th character positions are extracted from each word. A neighboring plural-character occurrence bitmap is created which stores data representing a presence of each neighboring plural-character string at a certain entry thereof. N-character strings at every (m+l)-th character positions are extracted from a search term and the neighboring plural-character occurrence bitmap is searched by using a search control program. Since the neighboring plural-character occurrence bitmap is searched prior to searching condensed texts, documents not relevant to the search term can be discarded and a high speed full text search can be realized.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.