System and method for identifying poisoned data during data curation using data source characteristics
US12405930B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 29, 2023 |
| Grant date | Sep 2, 2025 |
| Priority date | — |
| Expiry date | Jun 5, 2044 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F2221/034
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods and systems for curating data from data sources are disclosed. Data may be curated from various data sources before being supplied to downstream consumers that may rely on the trustworthiness of the curated data to facilitate desired computer-implemented services. During data curation, collected data may undergo anomaly detection to identify anomalies in the data. Data anomalies may indicate the presence of poisoned data that, if provided to downstream consumers, may negatively impact the desired computer-implemented services. When poisoned data is detected among the data, a poisoned portion of the data may be identified using an optimization process. The optimization process may consider the degree of anomalousness of the data (e.g., using statistical representations of the anomaly) and/or characteristics of the data source that supplied the anomalous data to identify the poisoned portion. Remedial actions may be identified and/or performed in order to reduce an impact of the poisoned data.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.