Linguistic nonsense detection for undesirable message classification
US7809795B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 26, 2006 |
| Grant date | Oct 5, 2010 |
| Priority date | — |
| Expiry date | Feb 1, 2028 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06Q10/107
- WIPO fieldIT methods for management
- WIPO sectorElectrical engineering
Abstract
Nonsense words are removed from incoming emails and visually similar (look-alike) characters are replaced with the actual, corresponding characters, so that the emails can be more accurately analyzed to see if they are spam. More specifically, an incoming email stream is filtered, and the emails are normalized to enable more accurate spam detection. In some embodiments, the normalization comprises the removal of nonsense words and/or the replacement of look-alike characters according to a set of rules. In other embodiments, more and/or different normalization techniques are utilized. In some embodiments, the language in which an email is written is identified in order to aid in the normalization. Once incoming emails are normalized, they are then analyzed to detect spam or other forms of undesirable email, such as phishing emails.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.