Patent · US Active

Method and apparatus for retrieving image-text block from web page

US10755091B2 · kind B2 · utility

1Cited by
0References
12Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 17, 2018
Grant dateAug 25, 2020
Priority date
Expiry dateOct 3, 2038

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/43
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method for retrieving an image-text block from a web page is provided, which comprises: retrieving an image node; filtering the image node to obtain candidate image nodes; traversing, for each of the candidate image nodes, a node in sequence toward an ancestor node of the candidate image node in a preset maximum traversal depth until an ancestor node with a text is visited, using the ancestor node with the text as a candidate image-text block; clustering the candidate image-text blocks based on hash values of the path information of the candidate image-text blocks; and determining, for each image-text block cluster, a common ancestor node of the candidate image-text blocks within the image-text block cluster based on the path information of the candidate image-text blocks, and determining path information of the image-text block cluster based on the common ancestor node.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.