Patent · US Active

High-performance data lake system and data storage method

US11789899B2 · kind B2 · utility

0Cited by
5References
10Claims
0Family size

Assignees

Inventors

Key dates

Filing dateNov 17, 2022
Grant dateOct 17, 2023
Priority date
Expiry dateNov 17, 2042

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/258
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

The present disclosure provides a high-performance data lake system and a data storage method. The data storage method includes the following steps: S1: converting a file into a file stream; S2: converting the file stream into an array in which multiple subarrays are nested; and S3: converting the array into a resilient distributed dataset (RDD), and storing the RDD to a storage layer of a data lake. The present disclosure provides a nested field structure, which lays the foundation for parallel processing in reading, and effectively improves read performance. Furthermore, the present disclosure flexibly generates a number of nested subarrays according to hardware cores, such that the data lake achieves better extension performance, and can keep optimal writing efficiency for different users.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.