System and method for improving feature selection for a spam filtering model
US8417783B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | May 31, 2006 |
| Grant date | Apr 9, 2013 |
| Priority date | — |
| Expiry date | Jan 13, 2030 |
Classification
- Technology area (CPC H)Electricity
- CPC primaryH04L51/212
- WIPO fieldDigital communication
- WIPO sectorElectrical engineering
Abstract
A system and method for removing ineffective features from a spam feature set. In particular, in one embodiment of the invention, the an entropy value is calculated for the feature set based on the effectiveness of the feature set at differentiating between ham and spam. Features are then removed one at a time and the entropy is recalculated. Features which increase the overall entropy are removed and features which decrease the overall entropy are retained. In another embodiment of the invention, the value of certain type of time consuming features (e.g., rules) is determined based on both the information gain associated with the features and the time consumed implementing the features. Those features which have relatively low information gain and which consume a significant amount of time to implement are removed from the feature set.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.