Patent · US Active

Dictionary based deduplication of training set samples for machine learning based computer threat analysis

US11373065B2 · kind B2 · utility

1Cited by

0References

16Claims

0Family size

Assignee

Cylance Inc. · US

Inventor

Andrew Davis · Williston, US

Key dates

Filing date	Jan 17, 2018
Grant date	Jun 28, 2022
Priority date	—
Expiry date	Sep 19, 2040

Classification

Technology area (CPC G)Physics
CPC primaryG06N20/20
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Presence of malicious code can be identified in one or more data samples. A feature set extracted from a sample is vectorized to generate a sparse vector. A reduced dimension vector representing the sparse vector can be generated. A binary representation vector of reduced dimension vector can be created by converting each value of a plurality of values in the reduced dimension vector to a binary representation. The binary representation vector can be added as a new element in a dictionary structure if the binary representation is not equal to an existing element in the dictionary structure. A training set for use in training a machine learning model can be created to include one vector whose binary representation corresponds to each of a plurality of elements in the dictionary structure.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.