Data analytics on distributed databases
US10614087B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 17, 2017 |
| Grant date | Apr 7, 2020 |
| Priority date | — |
| Expiry date | Jul 12, 2038 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/2471
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Data analytics is performed on a distributed document storage database by receiving a request for initiating a data analytics job; collecting statistics from the database in response to the request; using the statistics to estimate a first cost for merging an incremental data update for the job into a first resilient distributed dataset; using the statistics to estimate a second cost for newly creating a second resilient distributed dataset for the job; when the first cost is less than the second cost, reading data updates from the database and merging the data updates into the first resilient distributed dataset; and when the first cost is not less than the second cost, newly creating the second resilient distributed dataset by reading all documents from the database.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.