Identifying salient items in documents
US9251473B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 13, 2013 |
| Grant date | Feb 2, 2016 |
| Priority date | — |
| Expiry date | Dec 9, 2033 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N20/00
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A set of representations of item-page pairs of items and respective web pages that include the respective items is obtained, each representation including feature function values indicating weights associated with features of associated web pages, the features including page classification features. An annotated set of labeled training data that is annotated with salience annotation values of items for respective web pages that include the items is obtained. The salience annotation values are determined based on a soft function, by determining a first count of a total number of user queries associated with corresponding visits to the respective web pages, and determining a ratio of a second count to the first count, the second count determined as a cardinality of a subset of the corresponding visits that are associated with user queries that include the item, the subset included in the corresponding visits. Models are trained using the annotated set.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.