Method and apparatus for statistical text filtering
US6879722B2 · kind B2 · utility
Assignee
Inventor
Key dates
| Filing date | Jun 29, 2001 |
| Grant date | Apr 12, 2005 |
| Priority date | — |
| Expiry date | May 3, 2023 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/49
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Disclosed herein is a method for automatically filtering a corpus of documents containing textual and non-textual information of a natural language. According to the method, through a first dividing step (101), the document corpus is divided into appropriate portions. At a following determining step (105), for each portion of the document corpus, there is determined a regularity value (VR) measuring the conformity of the portion with respect to character sequences probabilities predetermined for the language considered. At a comparing step (107), each regularity value (VR) is then compared with a threshold value (VT) to decide whether the conformity is sufficient. Finally, at a rejecting step (111), any portion of the document corpus whose conformity is not sufficient is rejected and removed from the corpus. An apparatus for carrying out such a method is also disclosed.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.