Patent · US Expired

System and method for identifying compounds through iterative analysis

US7555428B1 · kind B1 · utility

25Cited by
6References
15Claims
0Family size

Assignee

Inventors

Key dates

Filing dateAug 21, 2003
Grant dateJun 30, 2009
Priority date
Expiry dateMar 17, 2025

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/284
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A system and method for identifying compounds through iterative analysis of measure of association is disclosed. A limit on a number of tokens per compound is specified. Compounds within a text corpus are iteratively evaluated. A number of occurrences of one or more n-grams within the text corpus is determined. Each n-gram includes up to a maximum number of tokens, which are each provided in a vocabulary for the text corpus. At least one n-gram including a number of tokens equal to the limit based on the number of occurrences is identified. A measure of association between the tokens in the identified n-gram is determined. Each identified n-gram with a sufficient measure of association is added to the vocabulary as a compound token and the limit is adjusted.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.