Statistical stemming
US8352247B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Apr 23, 2012 |
| Grant date | Jan 8, 2013 |
| Priority date | — |
| Expiry date | Apr 23, 2032 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/268
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating suffix rewriting rules. A method includes obtaining a plurality of canonical suffix-rewriting rules each associated with one or more words, generating a suffix tree from the words, selecting a minimum colored subset of the nodes and leaves in the suffix tree, and generating a plurality of final suffix-rewriting rules from the nodes in the minimum colored subset. Another method includes receiving applicable and non-applicable words for a suffix-rewriting rule, generating a suffix tree from the applicable words and the non-applicable words, selecting a minimum colored subset of the nodes and leaves in the suffix tree, and generating a plurality of suffix-rewriting rules, wherein each rule corresponds to a node in the minimum colored subset with a valid status.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.