Patent · US Active

Method of feature extraction from noisy documents

US8655803B2 · kind B2 · utility

9Cited by
0References
25Claims
0Family size

Assignee

Inventors

Key dates

Filing dateDec 17, 2008
Grant dateFeb 18, 2014
Priority date
Expiry dateAug 18, 2031

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/10
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Aspect of the exemplary embodiment relate to a method and apparatus for automatically identifying features that are suitable for use by a classifier in assigning class labels to text sequences extracted from noisy documents. The exemplary method includes receiving a dataset of text sequences, automatically identifying a set of patterns in the text sequences, and filtering the patterns to generate a set of features. The filtering includes at least one of filtering out redundant patterns and filtering out irrelevant patterns. The method further includes outputting at least some of the features in the set of features, optionally after fusing features which are determined not to affect the classifiers accuracy if they are merged.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.