Patent · US Active

Method and apparatus for discovering and classifying polysemous word instances in web documents

US8135715B2 · kind B2 · utility

2Cited by

8References

12Claims

0Family size

Assignee

YAHOO HOLDINGS, INC. · US

Inventor

Richard M. King · Cincinnati, US

Key dates

Filing date	Dec 14, 2007
Grant date	Mar 13, 2012
Priority date	—
Expiry date	Jun 25, 2029

Classification

Technology area (CPC G)Physics
CPC primaryG06F40/284
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method and apparatus for discovering polysemous words and classifying polysemous words found in web documents. All document corpi in any natural language have words that have multiple usage contexts or words that have multiple meanings. Semantic analysis is not feasible for classifying all word occurrences in all documents on the web, which contain trillions of words in total. In addition, semantic analysis typically cannot distinguish multiple usages of a given meaning of a given word. In one embodiment of this invention, polysemous words in natural languages can be discovered by analyzing the co-occurrence of other words with the polysemous word in web documents. In one embodiment, the multiple meanings and usages of a polysemous word can be determined by analyzing the co-occurrences of other words with the polysemous word. In one embodiment, overcorrelation tables and three-word correlation tables are generated to analyze the words found in web documents.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.