Method for extracting multi-word technical terms from text
US5423032A · kind A · utility
Assignee
Inventors
Key dates
| Filing date | Jan 3, 1992 |
| Grant date | Jun 6, 1995 |
| Priority date | — |
| Expiry date | Jan 3, 2012 |
Classification
- Technology area (CPC Y)Emerging Cross-Sectional Technologies
- CPC primaryY10S707/99935
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method and apparatus for extracting multi-word technical terms from a text file in a computer system. Word strings are selected from the text that have at least two words, that have at most a specified maximum number of words, that include none of a special set of selected tokens, and that only include selected characters. Word string which occur less than a specified minimum number of times in the text file are deleted. The remaining strings form a set of word strings very likely to be multi-word technical terms. Improvements on the quality of the set of word strings can be accomplished by deleting word strings which do not satisfy certain grammatical constraints.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.