Patent · US Active

System and engine for seeded clustering of news events

US11663254B2 · kind B2 · utility

1Cited by
3References
22Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJan 29, 2017
Grant dateMay 30, 2023
Priority date
Expiry dateJan 29, 2037

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/295
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

The present invention provides a seeded news event clustering and retrieval system configured to first create a candidate data set of documents, second create a set of initial clusters based on nearness or duplicate similarity status, and third create an aggregate cluster by merging initial clusters with seed documents. The invention generates top-level clusters for news events based on an editorially supplied topical label or “seed” component and generates sub-topic-focused clusters based on algorithm. The system uses an agglomerative clustering algorithm to gather and structure documents into distinct result sets. Decisions on whether to merge related documents or clusters are made according to similarity of evidence derived from two distinct sources, one, relying on a digital signature based on the unstructured text in the document, the other based on the presence of named entity tags that have been assigned to the document by an event or named entity tagger such as the Thomson Reuters Calais engine/web service.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.