Systems and methods for detecting sensitive information using pattern recognition
US10878124B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 5, 2018 |
| Grant date | Dec 29, 2020 |
| Priority date | — |
| Expiry date | Apr 10, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N3/044
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods and systems for identifying sensitive information are provided. The method includes tokenizing labeled data into first word sequences, the labeled data including sensitive information. The method includes associating the labeled sensitive information with tags. The method includes determining that the first word sequences and the tags satisfy conditions defined by feature functions. The method includes calculating a local maximum of a likelihood function to determine a weight. The method includes tokenizing unlabeled data into second word sequences, the unlabeled data including sensitive information. The method includes executing each feature function based on their weights, the second word sequences, and tag sequences. The method includes selecting tag sequences that maximize probabilities of the second word sequences based on the likelihood function. The method includes identifying sensitive information in the unlabeled data based on the selected tag sequences.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.