Decision-tree-based symbolic rule induction system for text categorization
US6519580B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 8, 2000 |
| Grant date | Feb 11, 2003 |
| Priority date | — |
| Expiry date | Jul 31, 2021 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F18/24323
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method to automatically categorize messages or documents containing text. The method of solution fits in the general framework of supervised learning, in which a rule or rules for categorizing data is automatically constructed by a computer on the basis of training data that has beforehand been categorized, i.e., each training data item has been labeled with the categories to which it belongs. More specifically, the method for rule induction involves the novel combination of (1) inducing from the training data a decision tree for each category, (2) automated construction from each decision tree of a simplified symbolic rule set that is logically equivalent overall to the decision tree, and which is to be used for categorization instead of the decision tree, and (3) determination of a confidence level for each rule. The method covers both decision-tree-based symbolic rule induction and the use for the purpose of document categorization of rules in the logical format of those generated by the rule induction procedure described herein.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.