Patent · US Active

Re-identification risk in de-identified databases containing personal information

US8316054B2 · kind B2 · utility

7Cited by
3References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 22, 2009
Grant dateNov 20, 2012
Priority date
Expiry dateApr 18, 2030

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/284
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A system and method of performing risk assessment of a dataset de-identified from a source database containing information identifiable to individuals is provided. The de-identified dataset is retrieved comprising a plurality of records from a storage device. A selection of variables from a user is received, the selection made from a plurality of variables present in the dataset, wherein the variables are potential identifiers of personal information. A selection of a risk threshold acceptable for the dataset from a user is received. A selection of a sampling fraction wherein the sampling fraction define a relative size of their dataset to an entire population is received. A number of records from the plurality of records for each equivalence class in the identification dataset for each of the selected variables. A re-identification risk using the selected sampling fraction is calculated. The re-identification risk meets the selected risk threshold is determined.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.