Patent · US Active

Systems and methods for querying large data repositories

US12189590B1 · kind B1 · utility

0Cited by
3References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 19, 2023
Grant dateJan 7, 2025
Priority date
Expiry dateSep 19, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/243
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Disclosed embodiments relate to systems, methods, and computer readable storage media for performing dataset discovery. Some embodiments may include accessing a data repository having a plurality of tables having cell values arranged in one or more columns and one or more rows, generating serialized sequences of the cell values that correspond to particular columns of the plurality of tables, inputting the serialized sequences into a natural language model, converting, using the natural language model, the serialized sequences into contextualized embeddings associated with the plurality of tables, storing the contextualized embeddings associated with the plurality of tables in one or more vector indices, receiving a query table, or generating an output of one or more candidate tables from the plurality of tables that are unionable with the received query table.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.