Patent · US Active

Data analytics on distributed databases

US10614087B2 · kind B2 · utility

0Cited by
1References
16Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJan 17, 2017
Grant dateApr 7, 2020
Priority date
Expiry dateJul 12, 2038

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/2471
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Data analytics is performed on a distributed document storage database by receiving a request for initiating a data analytics job; collecting statistics from the database in response to the request; using the statistics to estimate a first cost for merging an incremental data update for the job into a first resilient distributed dataset; using the statistics to estimate a second cost for newly creating a second resilient distributed dataset for the job; when the first cost is less than the second cost, reading data updates from the database and merging the data updates into the first resilient distributed dataset; and when the first cost is not less than the second cost, newly creating the second resilient distributed dataset by reading all documents from the database.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.