Systems and methods for condensation-based privacy in strings
US8010541B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 30, 2006 |
| Grant date | Aug 30, 2011 |
| Priority date | — |
| Expiry date | Oct 28, 2028 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F21/6245
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Novel methods and systems for the privacy preserving mining of string data with the use of simple template based models. Such template based models are effective in practice, and preserve important statistical characteristics of the strings such as intra-record distances. Discussed herein is the condensation model for anonymization of string data. Summary statistics are created for groups of strings, and use these statistics are used to generate pseudo-strings. It will be seen that the aggregate behavior of a new set of strings maintains key characteristics such as composition, the order of the intra-string distances, and the accuracy of data mining algorithms such as classification. The preservation of intra-string distances is a key goal in many string and biological applications which are deeply dependent upon the computation of such distances, while it can be shown that the accuracy of applications such as classification are not affected by the anonymization process.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.