Patent · US Active

System and method for content extraction from unstructured sources

US8073865B2 · kind B2 · utility

11Cited by
3References
16Claims
0Family size

Assignee

Inventor

Key dates

Filing dateSep 14, 2009
Grant dateDec 6, 2011
Priority date
Expiry dateJul 8, 2030

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/80
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A system and method for extracting content from unstructured sources is disclosed. The method includes analyzing web pages of a website, storing text and image data for each web page of the website, extracting a plurality of entities from the web page data, scoring each entity of the plurality of entities to provide an overall score for each entity, and defining a product based on the plurality of entities and the overall score for each entity.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.