Apparatus and method for identifying similarity via dynamic decimation of token sequence n-grams
US9111095B2 · kind B2 · utility
Assignee
Inventor
Key dates
| Filing date | Apr 9, 2014 |
| Grant date | Aug 18, 2015 |
| Priority date | — |
| Expiry date | Apr 19, 2034 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/284
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
An apparatus for identifying related code variants or text samples includes processing circuitry configured to execute instructions for receiving query binary code, processing the query binary code to generate one or more query code fingerprints comprising compressed representations of respective functional components of the query binary code, generating token sequence n-grams of the fingerprints, hashing the n-grams, partitioning samples by length to compare selected samples based on length, and identifying similarity via dynamic decimation of token sequence n-grams.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.