Multimodal perception decision-making method and apparatus for autonomous driving based on large language model
US12354375B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 17, 2025 |
| Grant date | Jul 8, 2025 |
| Priority date | — |
| Expiry date | Jan 17, 2045 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V2201/07
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A multimodal perception decision-making method for autonomous driving based on a large language model includes: acquiring an RGB image and an infrared image of a target area at current time; processing the RGB image using a target detection model to obtain a predicted bounding box and a corresponding target detection category; processing the infrared image and the predicted bounding box and the corresponding target detection categories by using a segmentation model to obtain a target mask image; fusing the RGB image, the target mask image and the infrared image using a fusion model to obtain a fused feature map; performing fusion processing on first prompt information representing a user intent, second prompt information representing target detection category priorities, and the fused feature map, using a large Vision-Language Model to obtain textual information; and processing the textual information using a large natural language model to obtain a perception decision-making result.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.