Patent · US Expired

Method and apparatus for statistical text filtering

US6879722B2 · kind B2 · utility

1Cited by
3References
28Claims
0Family size

Assignee

Inventor

Key dates

Filing dateJun 29, 2001
Grant dateApr 12, 2005
Priority date
Expiry dateMay 3, 2023

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/49
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Disclosed herein is a method for automatically filtering a corpus of documents containing textual and non-textual information of a natural language. According to the method, through a first dividing step (101), the document corpus is divided into appropriate portions. At a following determining step (105), for each portion of the document corpus, there is determined a regularity value (VR) measuring the conformity of the portion with respect to character sequences probabilities predetermined for the language considered. At a comparing step (107), each regularity value (VR) is then compared with a threshold value (VT) to decide whether the conformity is sufficient. Finally, at a rejecting step (111), any portion of the document corpus whose conformity is not sufficient is rejected and removed from the corpus. An apparatus for carrying out such a method is also disclosed.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.