Synthetic deidentified test data
US11392487B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Nov 16, 2020 |
| Grant date | Jul 19, 2022 |
| Priority date | — |
| Expiry date | Nov 16, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F11/3688
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Embodiments include a method for one or more processors to receive an organic dataset and a domain knowledge base. The one or more processors identify private data entities present within the organic dataset. The one or more processors determine statistical properties of the private data entities identified within the organic dataset. The one or more processors create a plurality of test data templates by removing the private data entities from the organic dataset. The one or more processors select from the domain knowledge base, synthetic data entities that match a data type of the removed private data entities, respectively, and align with the statistical properties of the private data entities, and the one or more processors generate synthetic test data by inserting, respectively, the synthetic data entities of the matching data type for the removed private data entities in the test data templates.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.