Method and apparatus for retrieving image-text block from web page
US10755091B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 17, 2018 |
| Grant date | Aug 25, 2020 |
| Priority date | — |
| Expiry date | Oct 3, 2038 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/43
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for retrieving an image-text block from a web page is provided, which comprises: retrieving an image node; filtering the image node to obtain candidate image nodes; traversing, for each of the candidate image nodes, a node in sequence toward an ancestor node of the candidate image node in a preset maximum traversal depth until an ancestor node with a text is visited, using the ancestor node with the text as a candidate image-text block; clustering the candidate image-text blocks based on hash values of the path information of the candidate image-text blocks; and determining, for each image-text block cluster, a common ancestor node of the candidate image-text blocks within the image-text block cluster based on the path information of the candidate image-text blocks, and determining path information of the image-text block cluster based on the common ancestor node.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.