Patent · US Expired

Natural-language processing system using a large corpus

US7392174B2 · kind B2 · utility

13Cited by
11References
24Claims
0Family size

Inventor

Key dates

Filing dateMar 20, 2001
Grant dateJun 24, 2008
Priority date
Expiry dateJan 27, 2023

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/216
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A computer-parsing system using vectors (lists) to represent natural-language elements, providing a robust, distributed way to score grammaticality of an input string by using as a source material a large corpus of natural-language text. The system uses recombining of asymmetric associations of syntactically similar strings to form the vectors. The system uses equivalence lists for subparts of the string to build equivalence lists for longer strings in an order controlled by the potential parse to be scored. The power of recombination of vector elements in building longer strings provides a means of representing collocational complexity. Grammaticality scoring is based upon the number and similarity of the vector elements.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.