Patent · US Active

Multimodal perception decision-making method and apparatus for autonomous driving based on large language model

US12354375B1 · kind B1 · utility

0Cited by

0References

15Claims

0Family size

Assignee

Beijing University of Chemical Technology · CN

Inventors

Zhiwei Li · Lo Wu, CN
Tingzhen Zhang · Beijing, CN
Haohan Wu · Beijing, CN
Weizheng Zhang · Beijing, CN
Weiye Xiao · Beijing, CN
Kunfeng Wang · Tangxia, CN
Wei Zhang · Shanghai, CN
Tianyu Shen · Beijing, CN
Li Wang · Beijing, CN
Qifan Tan · Beijing, CN

Key dates

Filing date	Jan 17, 2025
Grant date	Jul 8, 2025
Priority date	—
Expiry date	Jan 17, 2045

Classification

Technology area (CPC G)Physics
CPC primaryG06V2201/07
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A multimodal perception decision-making method for autonomous driving based on a large language model includes: acquiring an RGB image and an infrared image of a target area at current time; processing the RGB image using a target detection model to obtain a predicted bounding box and a corresponding target detection category; processing the infrared image and the predicted bounding box and the corresponding target detection categories by using a segmentation model to obtain a target mask image; fusing the RGB image, the target mask image and the infrared image using a fusion model to obtain a fused feature map; performing fusion processing on first prompt information representing a user intent, second prompt information representing target detection category priorities, and the fused feature map, using a large Vision-Language Model to obtain textual information; and processing the textual information using a large natural language model to obtain a perception decision-making result.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.