Classifying parts of a markup language document, and applications thereof
US12013913B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Nov 17, 2021 |
| Grant date | Jun 18, 2024 |
| Priority date | — |
| Expiry date | Sep 24, 2042 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/284
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A link-analyzing system (LAS) extracts information from a markup language (ML) document associated with a web page link. In some implementations, the information that is extracted includes at least: a) address content that is part of the link's destination address; and b) text that is associated with the link but that is not part of the destination address itself. The LAS generates feature information based on the address content and the text, and then uses a classification model to make a classification assessment for the link based on the feature information. In some implementations, the LAS can control a crawling engine based on the classification assessment. In some implementations, the LAS can revise a low-confidence classification assessment based on an examination of the classification assessments of a group of similar links described by the ML document. Other implementations use the above-described functionality to classify other parts of an ML document.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.