Patent · US Active

Identifying collocations in a corpus of text in a distributed computing environment

US9239827B2 · kind B2 · utility

3Cited by
6References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 19, 2012
Grant dateJan 19, 2016
Priority date
Expiry dateMay 15, 2033

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06Q50/01
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Technologies pertaining to computing a metric that is indicative of whether an n-gram in a large corpus of text is a collocation are described herein. The metric is computed in connection with a distributed computing framework, wherein n-grams of varying lengths can be analyzed in a single input data pass, and wherein secondary sorting functionality of the distributed computing framework need not be invoked.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.