Extracting structured data from weblogs
US11556598B2 · kind B2 · utility
Assignee
Inventor
Key dates
| Filing date | Jan 14, 2019 |
| Grant date | Jan 17, 2023 |
| Priority date | — |
| Expiry date | Nov 19, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/205
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods and apparatus for extracting structured data from weblogs are disclosed. In some examples, the methods and apparatus include a web crawler to access a home page of a weblog, and identify a feed associated with the weblog. The methods and apparatus also include a feed finder to determine whether items in the feed contain sufficient content for feed-guided segmentation. The methods and apparatus also include a feed classifier to determine whether the items in the feed contain full content of the weblog. The methods and apparatus also include a wrapper to map data found in the feed into a representation of a weblog post, and screen scrape the weblog into the representation of the weblog post.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.