Systems and methods for generating labeled short text sequences
US11797594B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Nov 10, 2020 |
| Grant date | Oct 24, 2023 |
| Priority date | — |
| Expiry date | Jan 11, 2042 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/30
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A set of documents related to a particular topic, industry, or entity are received. Sentences are extract from each document. The sentences are grouped into tuples of one, two, or three consecutive sentences (i.e., short text sequences). The sentence tuples are clustered based on vector representations of the sentences. For each cluster, a set of tuples that best represents or best fits the cluster is selected. These sentence tuples are fed to an ontology to determine ontological entities associated with each tuple. These determined ontological entities are associated with the clusters corresponding to each tuple. The sentence tuples associated with each cluster are labeled based on the ontological entities associated with the cluster. The labeled sentence tuples may then be used for a variety of purposes such as training a model to determine the topic of short text sequences.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.