Patent · US Active

System for estimating a distribution of message content categories in source data

US9189538B2 · kind B2 · utility

9Cited by
6References
16Claims
0Family size

Assignee

Inventors

Key dates

Filing dateApr 12, 2012
Grant dateNov 17, 2015
Priority date
Expiry dateSep 18, 2033

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N20/00
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method of computerized content analysis that gives “approximately unbiased and statistically consistent estimates” of a distribution of elements of structured, unstructured, and partially structured source data among a set of categories. In one embodiment, this is done by analyzing a distribution of small set of individually-classified elements in a plurality of categories and then using the information determined from the analysis to extrapolate a distribution in a larger population set. This extrapolation is performed without constraining the distribution of the unlabeled elements to be equal to the distribution of labeled elements, nor constraining a content distribution of content of elements in the labeled set (e.g., a distribution of words used by elements in the labeled set) to be equal to a content distribution of elements in the unlabeled set. Not being constrained in these ways allows the estimation techniques described herein to provide distinct advantages over conventional aggregation techniques.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.