Detecting and obfuscating sensitive data in unstructured text
US11347891B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 19, 2019 |
| Grant date | May 31, 2022 |
| Priority date | — |
| Expiry date | Oct 8, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F21/602
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Disclosed is a computer-implemented method to identify and anonymize personal information, the method comprising analyzing a first corpus with a personal information sniffer, wherein the first corpus includes unstructured text, wherein the personal information sniffer is configured to detect a set of types of personal information, and wherein the personal information sniffer produces a first set of results. The method comprises analyzing the first corpus with a set of annotators, wherein each annotator is configured to identify all instances of a type of personal information in the corpus, and wherein the set of annotators produces a second set of results. The method comprises comparing the first set of results and the second set of results, determining, the first set of results does not match the second set of results, and updating, based on the determining, the personal information sniffer.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.