Word detection
US7917355B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Aug 23, 2007 |
| Grant date | Mar 29, 2011 |
| Priority date | — |
| Expiry date | Jan 26, 2030 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/53
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods, systems, and apparatus, including computer program products, in which data from web documents are partitioned into a training corpus and a development corpus are provided. First word probabilities for words are determined for the training corpus, and second word probabilities for the words are determined for the development corpus. Uncertainty values based on the word probabilities for the training corpus and the development corpus are compared, and new words are identified based on the comparison.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.