Patent · US Active

Identifying key phrases within documents

US8423546B2 · kind B2 · utility

4Cited by
3References
16Claims
0Family size

Assignee

Inventors

Key dates

Filing dateDec 3, 2010
Grant dateApr 16, 2013
Priority date
Expiry dateFeb 20, 2031

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/258
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

The present invention extends to methods, systems, and computer program products for identifying key phrases within documents. Embodiments of the invention include using a tag index to determine what a document primarily relates to. For example, an integrated data flow and extract-transform-load pipeline, crawls, parses and word breaks large corpuses of documents in database tables. Documents can be broken into tuples. The tuples can be sent to a heuristically based algorithm that uses statistical language models and weight+cross-entropy threshold functions to summarize the document into its “top N” most statistically significant phrases. Accordingly, embodiments of the invention scale efficiently (e.g., linearly) and (potentially large numbers of) documents can be characterized by salient and relevant key phrases (tags).

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.