Patent · US Active

Clustering analysis for deduplication of training set samples for machine learning based computer threat analysis

US11620471B2 · kind B2 · utility

2Cited by
7References
14Claims
0Family size

Assignee

Inventor

Key dates

Filing dateNov 1, 2017
Grant dateApr 4, 2023
Priority date
Expiry dateDec 22, 2040

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F21/564
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method, a system, and a computer program product for performing analysis of data to detect presence of malicious code are disclosed. Reduced dimensionality vectors are generated from a plurality of original dimensionality vectors representing features in a plurality of samples. The reduced dimensionality vectors have a lower dimensionality than an original dimensionality of the plurality of original dimensionality vectors. A first plurality of clusters is determined by applying a first clustering algorithm to the reduced dimensionality vectors. A second plurality of clusters is determined by applying a second clustering algorithm to one or more clusters in the first plurality of clusters using the original dimensionality. An exemplar for a cluster in the second plurality of clusters is added to a training set, which is used to train a machine learning model for identifying a file containing malicious code.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.