Patent · US Active

Systems and methods for learning unified representations of language, image, and point cloud for three-dimensional recognition

US12417384B2 · kind B2 · utility

0Cited by
1References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMar 13, 2023
Grant dateSep 16, 2025
Priority date
Expiry dateMar 12, 2044

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06T2219/2004
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method of training a neural network based three-dimensional (3D) encoder is provided. A training dataset is generated using a plurality of 3D models of a 3D model dataset. To generate a first sample of the training dataset, an image generator with multi-view rendering is used to generate a plurality of image candidates of a first 3D model. A word is chosen from metadata associated with the first 3D model. A language model is used to generate one or more text descriptions using the selected word and a plurality of prompts. A point cloud is generated by randomly sampling points in the 3D model. The first sample is generated to include a first image randomly selected from the plurality of image candidates, one or more text descriptions, and the point cloud is generated. The 3D encoder is trained using the training dataset including the first sample.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.