Identifying topics in a digital work
US9613003B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 28, 2012 |
| Grant date | Apr 4, 2017 |
| Priority date | — |
| Expiry date | Mar 23, 2035 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06Q30/02
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
In some implementations, text is extracted from a digital work and a plurality of noun phrases are identified. The noun phrases are checked against a network accessible resource, such as an online encyclopedia, that includes a plurality of interlinked article entries. The noun phrases that have corresponding entries in the network accessible resource are included in a set of candidate topics. The candidate topics are ranked based, at least in part, on the links to and from each of the entries corresponding to the candidate topics. Candidate topics below a ranking threshold are removed from the set of candidate topics. Further, term frequency information for each candidate topic in relation to the digital work is compared against term frequency information for the candidate topic in a large corpus of textual works to remove candidate topics within a frequency difference threshold.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.