Identifying collocations in a corpus of text in a distributed computing environment
US9239827B2 · kind B2 · utility
3Cited by
6References
20Claims
0Family size
Assignee
Inventors
Key dates
| Filing date | Jun 19, 2012 |
| Grant date | Jan 19, 2016 |
| Priority date | — |
| Expiry date | May 15, 2033 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06Q50/01
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Technologies pertaining to computing a metric that is indicative of whether an n-gram in a large corpus of text is a collocation are described herein. The metric is computed in connection with a distributed computing framework, wherein n-grams of varying lengths can be analyzed in a single input data pass, and wherein secondary sorting functionality of the distributed computing framework need not be invoked.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.