Patent · US Active

Identifying topics in a digital work

US9613003B1 · kind B1 · utility

26Cited by
53References
25Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMar 28, 2012
Grant dateApr 4, 2017
Priority date
Expiry dateMar 23, 2035

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06Q30/02
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

In some implementations, text is extracted from a digital work and a plurality of noun phrases are identified. The noun phrases are checked against a network accessible resource, such as an online encyclopedia, that includes a plurality of interlinked article entries. The noun phrases that have corresponding entries in the network accessible resource are included in a set of candidate topics. The candidate topics are ranked based, at least in part, on the links to and from each of the entries corresponding to the candidate topics. Candidate topics below a ranking threshold are removed from the set of candidate topics. Further, term frequency information for each candidate topic in relation to the digital work is compared against term frequency information for the candidate topic in a large corpus of textual works to remove candidate topics within a frequency difference threshold.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.