Automatic genre classification determination of web content to which the web content belongs together with a corresponding genre probability
US10764353B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Oct 18, 2018 |
| Grant date | Sep 1, 2020 |
| Priority date | — |
| Expiry date | Oct 18, 2038 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N20/00
- WIPO fieldDigital communication
- WIPO sectorElectrical engineering
Abstract
A mechanism is provided for automatic genre determination of web content. For each type of web content genre, a set of relevant feature types are extracted from collected training material, where genre features and non-genre features are represented by tokens and an integer counts represents a frequency of appearance of the token in both a first type of training material and a second type of training material. In a classification process, fixed length tokens are extracted for relevant features types from different text and structural elements of web content. For each relevant feature type, a corresponding feature probability is calculated. The feature probabilities are combined to an overall genre probability that the web content belongs to a specific trained web content genre. A genre classification result is then output comprising at least one specific trained web content genre to which the web content belongs together with a corresponding genre probability.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.