Phrase matching for document classification
US8401842B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 11, 2008 |
| Grant date | Mar 19, 2013 |
| Priority date | — |
| Expiry date | Jan 18, 2032 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/205
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Phrase matching processes for matching phrases comprising a plurality of keywords in document text construct hit lists of the keywords in a document text, and operate on the keywords in either phrase order or without regard to the order of occurrence of the keywords in the phrase. The processes form sorted sets of all keywords, and compare occurrences of the keywords in the sorted sets to a predefined proximity constraint. For unordered phrases, the proximity constraint defines a maximum span between keywords in the highest and lowest positions in the sorted set as MaxSpan=p(k−1), where p is a proximity and k is the number of keywords in the phrase. For ordered phrases, the distances between successive phrase keywords in phrase order must be less than or equal to the proximity p.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.