Joining web data with spreadsheet data using examples
US10713429B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 27, 2017 |
| Grant date | Jul 14, 2020 |
| Priority date | — |
| Expiry date | Jun 27, 2037 |
Classification
- Technology area (CPC H)Electricity
- CPC primaryH04L67/75
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Provided are methods and systems for joining semi-structured data from the web with relational data in a spreadsheet table using input-output examples. A first sub-task performed by the system learns a string transformation program to transform input rows of a table to URL strings that correspond to the webpages where the relevant data is present. A second sub-task learns a program in a rich web data extraction language to extract desired data from the webpage given the example extractions. Hierarchical search and input-driven ranking are used to efficiently learn the programs using few input-output examples. The learnt programs are then run on the remaining spreadsheet entries to join desired data from the corresponding web pages.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.