Adaptive cycle consistency multimodal image captioning
US11651522B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 8, 2020 |
| Grant date | May 16, 2023 |
| Priority date | — |
| Expiry date | Sep 8, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N5/01
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
In an approach to improving the image captioning performance of low-resource languages by leveraging multimodal inputs, one or more computer processors encode an image utilizing an image encoder, wherein the image is contained within a triplet comprising the image, one or more high-resource captions, and one or more low-resource captions. The one or more computer processors generate one or more high-resource captions utilizing the encoded image and the triplet inputted into a high-resource decoder. The one or more computer processors encode the one or more generated high-resource captions utilizing a high-resource encoder. The one or more computer processors add adaptive cycle consistency constraints on a set of calculated attention weights associated the triplet. The one or more computer processors generate one or more low-resource captions by simultaneously inputting the encoded image, the encoded high-resource caption, and the triplet into a trained low-resource decoder.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.