Methods and systems for clustering documents based on semantic similarity
US12346383B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 11, 2024 |
| Grant date | Jul 1, 2025 |
| Priority date | — |
| Expiry date | Sep 11, 2044 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/93
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods and systems for clustering documents according to semantic similarity are disclosed. The method includes generating an embedding for each of a plurality of documents to form a plurality of embeddings. Each embedding is indicative of a semantic representation for the corresponding document. The method includes segregating the plurality of embeddings into a plurality of shards. The method includes clustering one or more embeddings within each shard of the plurality of shards into the one or more first clusters. The one or more first clusters for each shard collectively constitute a plurality of first clusters. The method includes generating a plurality of second clusters across the plurality of shards, based, at least in part, on semantic similarity between the plurality of embeddings and the plurality of first clusters.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.