Patent · US Active

Systems and methods for learning unified representations of language, image, and point cloud for three-dimensional recognition

US12417385B2 · kind B2 · utility

0Cited by
1References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMar 13, 2023
Grant dateSep 16, 2025
Priority date
Expiry dateJun 4, 2044

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06T2219/2004
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Systems and methods for training a neural network based three-dimensional (3D) encoder for 3D classification are provided. A training dataset including a plurality of samples is received, wherein a first sample includes an image, a text, and a point cloud. An image encoder of a pretrained vision and language model is used to generate image representations for the image of the first sample. A text encoder of the pretrained vision and language model is used to generate text representations for the text of the first sample. The neural network based 3D encoder is used to generate 3D representations for the point cloud of the first sample. A loss objective is computed based on the image representations, text representations, and 3D representations. Parameters of the neural network based 3D encoder are updated based on the computed loss objective via backpropagation.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.