Systems and methods for learning unified representations of language, image, and point cloud for three-dimensional recognition
US12417385B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 13, 2023 |
| Grant date | Sep 16, 2025 |
| Priority date | — |
| Expiry date | Jun 4, 2044 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06T2219/2004
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Systems and methods for training a neural network based three-dimensional (3D) encoder for 3D classification are provided. A training dataset including a plurality of samples is received, wherein a first sample includes an image, a text, and a point cloud. An image encoder of a pretrained vision and language model is used to generate image representations for the image of the first sample. A text encoder of the pretrained vision and language model is used to generate text representations for the text of the first sample. The neural network based 3D encoder is used to generate 3D representations for the point cloud of the first sample. A loss objective is computed based on the image representations, text representations, and 3D representations. Parameters of the neural network based 3D encoder are updated based on the computed loss objective via backpropagation.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.