Representing source code in vector space to detect errors
US11334467B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | May 3, 2019 |
| Grant date | May 17, 2022 |
| Priority date | — |
| Expiry date | Feb 4, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N3/0895
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A computer-implemented method, system and computer program product for representing source code in vector space. The source code is parsed into an abstract syntax tree, which is then traversed to produce a sequence of tokens. Token embeddings may then be constructed for a subset of the sequence of tokens, which are inputted into an encoder artificial neural network (“encoder”) for encoding the token embeddings. A decoder artificial neural network (“decoder”) is initialized with a final internal cell state of the encoder. The decoder is run the same number of steps as the encoding performed by the encoder. After running the decoder and completing the training of the decoder to learn the inputted token embeddings, the final internal cell state of the encoder is used as the code representation vector which may be used to detect errors in the source code.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.