Patent · US Active

Processing data of a file using multiple threads during a deduplication gathering phase

US8234250B1 · kind B1 · utility

14Cited by
1References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 17, 2009
Grant dateJul 31, 2012
Priority date
Expiry dateJan 5, 2031

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/174
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method and apparatus for deduplication of files of a storage system is described. During a gathering phase, a file may be simultaneously processed by two or more threads to produce and store content identifiers for data blocks of the file. Each file may be sub-divided into multiple file sub-portions, each file sub-portion comprising a predetermined number of data blocks. A thread may be assigned to each sub-portion of a file for processing the data blocks. The currently assigned sub-portion for each thread may be recorded and used upon a system crash to restart each scanner thread at the currently assigned sub-portion to minimize the data blocks that are re-processed. The size of a file sub-portion may be predetermined based on the organization of inode data structures representing the files (e.g., based on the maximum number of pointers that an indirect block in the inode data structure may contain).

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.