High-performance data lake system and data storage method
US11789899B2 · kind B2 · utility
Assignees
Inventors
Key dates
| Filing date | Nov 17, 2022 |
| Grant date | Oct 17, 2023 |
| Priority date | — |
| Expiry date | Nov 17, 2042 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/258
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
The present disclosure provides a high-performance data lake system and a data storage method. The data storage method includes the following steps: S1: converting a file into a file stream; S2: converting the file stream into an array in which multiple subarrays are nested; and S3: converting the array into a resilient distributed dataset (RDD), and storing the RDD to a storage layer of a data lake. The present disclosure provides a nested field structure, which lays the foundation for parallel processing in reading, and effectively improves read performance. Furthermore, the present disclosure flexibly generates a number of nested subarrays according to hardware cores, such that the data lake achieves better extension performance, and can keep optimal writing efficiency for different users.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.