Data transformation of Cassandra files for improved deduplication during backup
US10769111B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Apr 24, 2018 |
| Grant date | Sep 8, 2020 |
| Priority date | — |
| Expiry date | Oct 12, 2038 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/205
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Cassandra SSTable data is transformed to provide data rows that are a consistent size such that data in each row has a length that is contained within a selected fixed sized kilobyte segment for deduplication. Tables of a Cassandra cluster node are translated in parallel to JSON format using Cassandra SSTableDump and the table rows are parsed to provide data rows corresponding to the data in each table row. Each row of data is padded with a predictable pattern of bits such that the data row has a length corresponding to the selected fixed segment size and has boundary locations that correspond to multiple of the selected segment size. Since each row of data starts on a segment boundary, duplicate rows of data will be identified wherever they move within a table.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.