Re-identification risk in de-identified databases containing personal information
US8316054B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 22, 2009 |
| Grant date | Nov 20, 2012 |
| Priority date | — |
| Expiry date | Apr 18, 2030 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/284
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A system and method of performing risk assessment of a dataset de-identified from a source database containing information identifiable to individuals is provided. The de-identified dataset is retrieved comprising a plurality of records from a storage device. A selection of variables from a user is received, the selection made from a plurality of variables present in the dataset, wherein the variables are potential identifiers of personal information. A selection of a risk threshold acceptable for the dataset from a user is received. A selection of a sampling fraction wherein the sampling fraction define a relative size of their dataset to an entire population is received. A number of records from the plurality of records for each equivalence class in the identification dataset for each of the selected variables. A re-identification risk using the selected sampling fraction is calculated. The re-identification risk meets the selected risk threshold is determined.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.