Patent · US Active

Efficient caching and data access to a remote data lake in a large scale data processing environment

US11797447B2 · kind B2 · utility

0Cited by
9References
19Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMar 17, 2021
Grant dateOct 24, 2023
Priority date
Expiry dateMar 17, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F2212/1021
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Embodiments described herein are generally directed to caching and data access improvements in a large scale data processing environment. According to an example, an agent running on a first worker node of a cluster receives a read request from a task. The worker node of the cluster to which the data at issue is mapped is identified. When the first worker node is the identified worker node, it is determined whether its cache contains the data; if so, the data is fetched from a remote data lake and the agent locally caches the data; otherwise, when the identified worker node is another worker node of the compute cluster, the data is fetched from a remote agent of that worker node. The agent responds to the read request with cached data, data returned by the remote data lake, or data returned by the remote data agent as the case may be.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.