System and method for identifying compounds through iterative analysis
US7555428B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Aug 21, 2003 |
| Grant date | Jun 30, 2009 |
| Priority date | — |
| Expiry date | Mar 17, 2025 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/284
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A system and method for identifying compounds through iterative analysis of measure of association is disclosed. A limit on a number of tokens per compound is specified. Compounds within a text corpus are iteratively evaluated. A number of occurrences of one or more n-grams within the text corpus is determined. Each n-gram includes up to a maximum number of tokens, which are each provided in a vocabulary for the text corpus. At least one n-gram including a number of tokens equal to the limit based on the number of occurrences is identified. A measure of association between the tokens in the identified n-gram is determined. Each identified n-gram with a sufficient measure of association is added to the vocabulary as a compound token and the limit is adjusted.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.