Generating vector representations of code capturing semantic similarity
US11238306B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 27, 2018 |
| Grant date | Feb 1, 2022 |
| Priority date | — |
| Expiry date | Dec 3, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N20/00
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method, system and computer program product for obtaining vector representations of code snippets capturing semantic similarity. A first and second training set of code snippets are collected, where the first training set of code snippets implements the same function representing semantic similarity and the second training set of code snippets implements a different function representing semantic dissimilarity. A vector representation of a first and second code snippet from either the first or second training set of code snippets is generated using a machine learning model. A loss value is generated utilizing a loss function that is proportional or inverse to the distance between the first and second vectors in response to receiving the first and second code snippets from the first or second training set of code snippets, respectively. The machine learning model is trained to capture the semantic similarity in the code snippets by minimizing the loss value.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.