Apparatus and method for sampling large data sets in a distributed data storage system
US10866874B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 27, 2019 |
| Grant date | Dec 15, 2020 |
| Priority date | — |
| Expiry date | Jun 27, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F3/067
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A system includes a distributed data storage system disseminated across worker machines connected by a network. A distributed data storage management module has instructions executed by a processor to utilize data block identifiers to track data block accesses to the distributed data storage system. A sampling module with instructions executed by the processor receives a new sample request from a client machine connected to the network. Initial data block samples are gathered from the distributed data storage system during a first time period. A revised sample request is received from the client machine during the first time period. The initial data block samples are gathered. New data block samples are collected from the distributed data storage system. The initial data block samples and the new data block samples are combined to form cumulative data block sample results. The cumulative data block sample results are supplied to the client machine.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.