Patent · US Active

Multimodal perception decision-making method and apparatus for autonomous driving based on large language model

US12354375B1 · kind B1 · utility

0Cited by
0References
15Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJan 17, 2025
Grant dateJul 8, 2025
Priority date
Expiry dateJan 17, 2045

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V2201/07
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A multimodal perception decision-making method for autonomous driving based on a large language model includes: acquiring an RGB image and an infrared image of a target area at current time; processing the RGB image using a target detection model to obtain a predicted bounding box and a corresponding target detection category; processing the infrared image and the predicted bounding box and the corresponding target detection categories by using a segmentation model to obtain a target mask image; fusing the RGB image, the target mask image and the infrared image using a fusion model to obtain a fused feature map; performing fusion processing on first prompt information representing a user intent, second prompt information representing target detection category priorities, and the fused feature map, using a large Vision-Language Model to obtain textual information; and processing the textual information using a large natural language model to obtain a perception decision-making result.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.