Patent · US Active

Systems and methods for learning unified representations of language, image, and point cloud for three-dimensional recognition

US12417384B2 · kind B2 · utility

0Cited by

1References

20Claims

0Family size

Assignee

Salesforce, Inc. · US

Inventors

Le Xue · Mountain View, US
Chen Xing · Sunnyvale, US
Juan Carlos Niebles Duque · Mountain View, US
Caiming Xiong · Menlo Park, US
Ran Xu · Beijing, CN
Silvio Savarese · Palo Alto, US

Key dates

Filing date	Mar 13, 2023
Grant date	Sep 16, 2025
Priority date	—
Expiry date	Mar 12, 2044

Classification

Technology area (CPC G)Physics
CPC primaryG06T2219/2004
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method of training a neural network based three-dimensional (3D) encoder is provided. A training dataset is generated using a plurality of 3D models of a 3D model dataset. To generate a first sample of the training dataset, an image generator with multi-view rendering is used to generate a plurality of image candidates of a first 3D model. A word is chosen from metadata associated with the first 3D model. A language model is used to generate one or more text descriptions using the selected word and a plurality of prompts. A point cloud is generated by randomly sampling points in the 3D model. The first sample is generated to include a first image randomly selected from the plurality of image candidates, one or more text descriptions, and the point cloud is generated. The 3D encoder is trained using the training dataset including the first sample.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.