Patent · US Active

Techniques for estimating item frequencies in large data sets

US8489645B2 · kind B2 · utility

3Cited by
4References
14Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 27, 2004
Grant dateJul 16, 2013
Priority date
Expiry dateFeb 28, 2031

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06Q10/10
  • WIPO fieldIT methods for management
  • WIPO sectorElectrical engineering

Abstract

Techniques for estimating items (e.g., data item or objects) frequencies in large data sets are disclosed. For example, a technique for determining items and their frequencies at multiple levels of interest in a collection of nested bags includes the following steps. A hierarchy of a plurality of levels of nested bags and the levels of interest are inputted. Among the plurality of levels, a subset of bags is sampled from at least one level. At each level of interest, the frequency is counted of each distinct item in the bags obtained in the sampling step. At each level of interest, the item frequencies obtained in the counting step are extrapolated based on sampling ratios associated with the sampling step. At each level of interest, the items are sorted according to their frequencies obtained from the extrapolating step and those items with highest frequencies are retained. A bag may refer to one or more subsets or groups of data items or objects. Also, a bag may, itself, contain one or more other bags.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.