Patent · US Active

Systems and methods for multimodal pretraining for three-dimensional understanding models

US12430849B2 · kind B2 · utility

0Cited by

3References

20Claims

0Family size

Assignee

Salesforce, Inc. · US

Inventors

Le Xue · Mountain View, US
Ning Yu · Beijing, CN
Shu Zhang · Zhuzhou, CN
Junnan Li · Singapore, SG
Caiming Xiong · Menlo Park, US
Silvio Savarese · Palo Alto, US
Juan Carlos Niebles Duque · Mountain View, US
Ran Xu · Beijing, CN

Key dates

Filing date	Oct 24, 2023
Grant date	Sep 30, 2025
Priority date	—
Expiry date	Feb 11, 2044

Classification

Technology area (CPC G)Physics
CPC primaryG06T2210/56
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method of training a neural network based three-dimensional (3D) encoder is provided. A first plurality of samples of a training dataset are generated using a first 3D model. An image generator with multi-view rendering is used to generate a plurality of two-dimensional (2D) images having different viewpoints of the first 3D model. A first language model is used to generate a plurality of texts corresponding to the plurality of 2D images respectively. A first text for a first image is generated by using one or more text descriptions generated by the first language model. A point cloud is generated by randomly sampling points in the 3D model. The first plurality of samples are generated using the plurality of 2D images, the corresponding plurality of texts, and the point cloud. The neural network based 3D encoder is trained using the training dataset including the first plurality of samples.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.