Method of feature extraction from noisy documents
US8655803B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 17, 2008 |
| Grant date | Feb 18, 2014 |
| Priority date | — |
| Expiry date | Aug 18, 2031 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/10
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Aspect of the exemplary embodiment relate to a method and apparatus for automatically identifying features that are suitable for use by a classifier in assigning class labels to text sequences extracted from noisy documents. The exemplary method includes receiving a dataset of text sequences, automatically identifying a set of patterns in the text sequences, and filtering the patterns to generate a set of features. The filtering includes at least one of filtering out redundant patterns and filtering out irrelevant patterns. The method further includes outputting at least some of the features in the set of features, optionally after fusing features which are determined not to affect the classifiers accuracy if they are merged.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.