Systems and methods for querying large data repositories
US12189590B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 19, 2023 |
| Grant date | Jan 7, 2025 |
| Priority date | — |
| Expiry date | Sep 19, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/243
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Disclosed embodiments relate to systems, methods, and computer readable storage media for performing dataset discovery. Some embodiments may include accessing a data repository having a plurality of tables having cell values arranged in one or more columns and one or more rows, generating serialized sequences of the cell values that correspond to particular columns of the plurality of tables, inputting the serialized sequences into a natural language model, converting, using the natural language model, the serialized sequences into contextualized embeddings associated with the plurality of tables, storing the contextualized embeddings associated with the plurality of tables in one or more vector indices, receiving a query table, or generating an output of one or more candidate tables from the plurality of tables that are unionable with the received query table.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.