Patent · US Active

Extracting structured data from weblogs

US11556598B2 · kind B2 · utility

0Cited by
163References
25Claims
0Family size

Assignee

Inventor

Key dates

Filing dateJan 14, 2019
Grant dateJan 17, 2023
Priority date
Expiry dateNov 19, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/205
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Methods and apparatus for extracting structured data from weblogs are disclosed. In some examples, the methods and apparatus include a web crawler to access a home page of a weblog, and identify a feed associated with the weblog. The methods and apparatus also include a feed finder to determine whether items in the feed contain sufficient content for feed-guided segmentation. The methods and apparatus also include a feed classifier to determine whether the items in the feed contain full content of the weblog. The methods and apparatus also include a wrapper to map data found in the feed into a representation of a weblog post, and screen scrape the weblog into the representation of the weblog post.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.