Models for classifying documents
US10489441B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 27, 2017 |
| Grant date | Nov 26, 2019 |
| Priority date | — |
| Expiry date | Jul 27, 2037 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/353
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Some embodiments provide a method for defining a content relevance model for determining whether a content segment is relevant to a particular category. The method receives a first set of content segments that contain content relevant to the particular category and a second set of content segments that contain content not relevant to the particular category. The method identifies a set of key word sets more likely to appear in the first set of content segments than the second set of content segments. The method defines a content relevance model that comprises a set of groups of word sets and a score for each group, each of the groups of word sets comprising a key word set from the set of key word sets and at least one word set found in a context of the key word set in at least one of the received content segments.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.