System and method for content extraction from unstructured sources
US8073865B2 · kind B2 · utility
11Cited by
3References
16Claims
0Family size
Assignee
Inventor
Key dates
| Filing date | Sep 14, 2009 |
| Grant date | Dec 6, 2011 |
| Priority date | — |
| Expiry date | Jul 8, 2030 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/80
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A system and method for extracting content from unstructured sources is disclosed. The method includes analyzing web pages of a website, storing text and image data for each web page of the website, extracting a plurality of entities from the web page data, scoring each entity of the plurality of entities to provide an overall score for each entity, and defining a product based on the plurality of entities and the overall score for each entity.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.