Patent · US Active

Systems and methods for condensation-based privacy in strings

US8010541B2 · kind B2 · utility

2Cited by
1References
23Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 30, 2006
Grant dateAug 30, 2011
Priority date
Expiry dateOct 28, 2028

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F21/6245
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Novel methods and systems for the privacy preserving mining of string data with the use of simple template based models. Such template based models are effective in practice, and preserve important statistical characteristics of the strings such as intra-record distances. Discussed herein is the condensation model for anonymization of string data. Summary statistics are created for groups of strings, and use these statistics are used to generate pseudo-strings. It will be seen that the aggregate behavior of a new set of strings maintains key characteristics such as composition, the order of the intra-string distances, and the accuracy of data mining algorithms such as classification. The preservation of intra-string distances is a key goal in many string and biological applications which are deeply dependent upon the computation of such distances, while it can be shown that the accuracy of applications such as classification are not affected by the anonymization process.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.