Patent · US Active

Data transformation of Cassandra files for improved deduplication during backup

US10769111B2 · kind B2 · utility

1Cited by

1References

14Claims

0Family size

Assignee

EMC Corporation · US

Inventors

Charles Christopher Bailey · Durham, US
Donna Barry Lewis · Sunrise, US
Jeffrey Ford · Gilbert, US
Frederick Douglis · Berkeley Heights, US

Key dates

Filing date	Apr 24, 2018
Grant date	Sep 8, 2020
Priority date	—
Expiry date	Oct 12, 2038

Classification

Technology area (CPC G)Physics
CPC primaryG06F40/205
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Cassandra SSTable data is transformed to provide data rows that are a consistent size such that data in each row has a length that is contained within a selected fixed sized kilobyte segment for deduplication. Tables of a Cassandra cluster node are translated in parallel to JSON format using Cassandra SSTableDump and the table rows are parsed to provide data rows corresponding to the data in each table row. Each row of data is padded with a predictable pattern of bits such that the data row has a length corresponding to the selected fixed segment size and has boundary locations that correspond to multiple of the selected segment size. Since each row of data starts on a segment boundary, duplicate rows of data will be identified wherever they move within a table.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.