Processing data of a file using multiple threads during a deduplication gathering phase
US8234250B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 17, 2009 |
| Grant date | Jul 31, 2012 |
| Priority date | — |
| Expiry date | Jan 5, 2031 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/174
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method and apparatus for deduplication of files of a storage system is described. During a gathering phase, a file may be simultaneously processed by two or more threads to produce and store content identifiers for data blocks of the file. Each file may be sub-divided into multiple file sub-portions, each file sub-portion comprising a predetermined number of data blocks. A thread may be assigned to each sub-portion of a file for processing the data blocks. The currently assigned sub-portion for each thread may be recorded and used upon a system crash to restart each scanner thread at the currently assigned sub-portion to minimize the data blocks that are re-processed. The size of a file sub-portion may be predetermined based on the organization of inode data structures representing the files (e.g., based on the maximum number of pointers that an indirect block in the inode data structure may contain).
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.