Patent · US Active

Training procedure for N-gram-based statistical content classification

US7917522B1 · kind B1 · utility

5Cited by
12References
6Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 24, 2010
Grant dateMar 29, 2011
Priority date
Expiry dateJun 24, 2030

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/35
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A training procedure for N-gram based statistical document classification has been disclosed. In one embodiment, a set of N-grams is selected out of a second set of N-grams, each of the N-grams having a sequence of N bytes, where N is an integer. Then a statistical content classification model is generated based on occurrences of the N-grams, if any, in a set of training documents and a set of validation documents. The statistical content classification model is provided to content filters to classify content.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.