Automated sensitive data classification in computerized databases
US11941135B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Aug 23, 2019 |
| Grant date | Mar 26, 2024 |
| Priority date | — |
| Expiry date | Sep 28, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F21/6245
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Automated classification of sensitive data in a database, which includes: Retrieving a catalog of a database. Sampling record values from at least some of the columns. Generating a map of probable associations between different columns of tables of the database. Applying a machine learning classifier to the sampled record values, to classify the columns of the sampled records into multiple data classes, some being sensitive data classes. Classifying columns of non-sampled record values according to the classification of the sampled record values, based on the map. Searching all objects of the database for existence of record values of the classified columns, to output value and field name pairs. Scoring the pairs according to a measure of their repetitiveness in the output. Increasing the score of the pairs whose field names are similar. Based on the scores, indicating which fields of the database are likely to include sensitive data.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.