Apparatus and method for identifying similarity via dynamic decimation of token sequence N-grams
US9910985B2 · kind B2 · utility
Assignee
Inventor
Key dates
| Filing date | Jun 30, 2015 |
| Grant date | Mar 6, 2018 |
| Priority date | — |
| Expiry date | Aug 6, 2035 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/284
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
An apparatus for identifying related code variants or text samples includes processing circuitry configured to execute instructions for receiving query binary code, processing the query binary code to generate one or more query code fingerprints comprising compressed representations of respective functional components of the query binary code, generating token sequence n-grams of the fingerprints, hashing the n-grams, partitioning samples by length to compare selected samples based on length, and identifying similarity via dynamic decimation of token sequence n-grams.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.