Multimodal data heterogeneous transformer-based asset recognition method, system, and device
US12236699B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Nov 22, 2024 |
| Grant date | Feb 25, 2025 |
| Priority date | — |
| Expiry date | Nov 22, 2044 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/19173
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
This invention discloses a multimodal data heterogeneous Transformer-based asset recognition method, system, and device, the method including: collecting various-modal information of an asset, including text information and image information; building an ALBERT model, a ViT model, and a CLIP model; by the ALBERT model, extracting a text information feature; by the ViT model, extracting an image information feature; by the CLIP model, extracting image-text matching information feature; by different channels, applying asset type recognition to information in different modalities; outputting classification information from the different channels; by the CLIP model, generating asset void information; and discriminatively fusing the classification information from the different channels with the matching degree between the image information and the text information obtained by the CLIP model, and outputting final asset class information. This invention realizes comprehensive discrimination by drawing from multiple modalities to improve the accuracy of asset recognition.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.