Patent · US Active

Systems and methods for learning unified representations of language, image, and point cloud for three-dimensional recognition

US12417385B2 · kind B2 · utility

0Cited by

1References

20Claims

0Family size

Assignee

Salesforce, Inc. · US

Inventors

Le Xue · Mountain View, US
Chen Xing · Sunnyvale, US
Juan Carlos Niebles Duque · Mountain View, US
Caiming Xiong · Menlo Park, US
Ran Xu · Beijing, CN
Silvio Savarese · Palo Alto, US

Key dates

Filing date	Mar 13, 2023
Grant date	Sep 16, 2025
Priority date	—
Expiry date	Jun 4, 2044

Classification

Technology area (CPC G)Physics
CPC primaryG06T2219/2004
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Systems and methods for training a neural network based three-dimensional (3D) encoder for 3D classification are provided. A training dataset including a plurality of samples is received, wherein a first sample includes an image, a text, and a point cloud. An image encoder of a pretrained vision and language model is used to generate image representations for the image of the first sample. A text encoder of the pretrained vision and language model is used to generate text representations for the text of the first sample. The neural network based 3D encoder is used to generate 3D representations for the point cloud of the first sample. A loss objective is computed based on the image representations, text representations, and 3D representations. Parameters of the neural network based 3D encoder are updated based on the computed loss objective via backpropagation.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.