Methods, apparatus, systems and computer readable media for use in keyword extraction
US9384287B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 15, 2014 |
| Grant date | Jul 5, 2016 |
| Priority date | — |
| Expiry date | Mar 23, 2035 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/2468
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
In one embodiment, a method includes: receiving data representing a plurality of corpora, each of the plurality of corpora including a set of documents; receiving data representing terms that appear in the corpora; for each one of the terms, determining a plurality of inverse document frequency values each associated with a respective one of the plurality of corpora; receiving data representing a subset of the terms that also appear in a document; for each term in the subset, determining a term frequency for the term in the document; and for each term in the subset, determining, an augmented term frequency-inverse document frequency value based on: (i) the term frequency, and (ii) the plurality of inverse document frequency values that were determined for the term in the subset.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.