Utility-preserving text de-identification with privacy guarantees
US11449674B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Apr 28, 2020 |
| Grant date | Sep 20, 2022 |
| Priority date | — |
| Expiry date | Apr 28, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/30
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
One embodiment of the invention provides a method for utility-preserving text de-identification. The method comprises generating corresponding processed text for each text document by applying at least one natural language processor (NLP) annotator to the text document to recognize and tag privacy-sensitive personal information corresponding to an individual, and replacing some words in the text document with some replacement values. The method further comprises determining infrequent terms occurring across all processed texts, filtering out the infrequent terms from the processed texts, and selectively reinstating to the processed texts at least one of the infrequent terms that is innocuous. The method further comprises generating a corresponding de-identified text document for each processed text by anonymizing privacy-sensitive personal information corresponding to an individual in the processed text to an extent that preserves data utility of the processed text and conceals the individual's personal identity.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.