Method and apparatus for automatically identifying keywords within a document
US6470307B1 · kind B1 · utility
Assignee
Inventor
Key dates
| Filing date | Jun 23, 1997 |
| Grant date | Oct 22, 2002 |
| Priority date | — |
| Expiry date | Jun 23, 2017 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/30
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A trainable method of extracting keywords of one or more words is disclosed. According to the method, every word within a document that is not a stop word is stemmed and evaluated and receives a score. The scoring is performed based on a plurality of parameters which are adjusted through training prior to use of the method for keyword extraction. Each word having a high score is then replaced by a word phrase that is delimited by punctuation or stop words. The word phrase is selected from word phrases having the stemmed word therein. Repeated keywords are removed. The keywords are expanded and capitalisation is determined. The resulting list forms extracted keywords.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.