Patent · US Active

Methods and systems for clustering documents based on semantic similarity

US12346383B1 · kind B1 · utility

0Cited by
17References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 11, 2024
Grant dateJul 1, 2025
Priority date
Expiry dateSep 11, 2044

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/93
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Methods and systems for clustering documents according to semantic similarity are disclosed. The method includes generating an embedding for each of a plurality of documents to form a plurality of embeddings. Each embedding is indicative of a semantic representation for the corresponding document. The method includes segregating the plurality of embeddings into a plurality of shards. The method includes clustering one or more embeddings within each shard of the plurality of shards into the one or more first clusters. The one or more first clusters for each shard collectively constitute a plurality of first clusters. The method includes generating a plurality of second clusters across the plurality of shards, based, at least in part, on semantic similarity between the plurality of embeddings and the plurality of first clusters.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.