Training procedure for N-gram-based statistical content classification
US7792846B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 27, 2007 |
| Grant date | Sep 7, 2010 |
| Priority date | — |
| Expiry date | Jul 29, 2028 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/35
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A training procedure for N-gram based statistical document classification has been disclosed. In one embodiment, a set of N-grams is selected out of a second set of N-grams, each of the N-grams having a sequence of N bytes, where N is an integer. Then a statistical content classification model is generated based on occurrences of the N-grams, if any, in a set of training documents and a set of validation documents. The statistical content classification model is provided to content filters to classify content.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.