Patent · US Active

Data transformation of Cassandra files for improved deduplication during backup

US10769111B2 · kind B2 · utility

1Cited by
1References
14Claims
0Family size

Assignee

Inventors

Key dates

Filing dateApr 24, 2018
Grant dateSep 8, 2020
Priority date
Expiry dateOct 12, 2038

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/205
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Cassandra SSTable data is transformed to provide data rows that are a consistent size such that data in each row has a length that is contained within a selected fixed sized kilobyte segment for deduplication. Tables of a Cassandra cluster node are translated in parallel to JSON format using Cassandra SSTableDump and the table rows are parsed to provide data rows corresponding to the data in each table row. Each row of data is padded with a predictable pattern of bits such that the data row has a length corresponding to the selected fixed segment size and has boundary locations that correspond to multiple of the selected segment size. Since each row of data starts on a segment boundary, duplicate rows of data will be identified wherever they move within a table.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.