System and engine for seeded clustering of news events
US11663254B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 29, 2017 |
| Grant date | May 30, 2023 |
| Priority date | — |
| Expiry date | Jan 29, 2037 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/295
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
The present invention provides a seeded news event clustering and retrieval system configured to first create a candidate data set of documents, second create a set of initial clusters based on nearness or duplicate similarity status, and third create an aggregate cluster by merging initial clusters with seed documents. The invention generates top-level clusters for news events based on an editorially supplied topical label or “seed” component and generates sub-topic-focused clusters based on algorithm. The system uses an agglomerative clustering algorithm to gather and structure documents into distinct result sets. Decisions on whether to merge related documents or clusters are made according to similarity of evidence derived from two distinct sources, one, relying on a digital signature based on the unstructured text in the document, the other based on the presence of named entity tags that have been assigned to the document by an event or named entity tagger such as the Thomson Reuters Calais engine/web service.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.