Collecting multimodal image editing requests
US10769495B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Aug 1, 2018 |
| Grant date | Sep 8, 2020 |
| Priority date | — |
| Expiry date | Sep 26, 2038 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L2015/0638
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
In implementations of collecting multimodal image editing requests (IERs), a user interface is generated that exposes an image pair including a first image and a second image including at least one edit to the first image. A user simultaneously speaks a voice command and performs a user gesture that describe an edit of the first image used to generate the second image. The user gesture and the voice command are simultaneously recorded and synchronized with timestamps. The voice command is played back, and the user transcribes their voice command based on the play back, creating an exact transcription of their voice command. Audio samples of the voice command with respective timestamps, coordinates of the user gesture with respective timestamps, and a transcription are packaged as a structured data object for use as training data to train a neural network to recognize multimodal IERs in an image editing application.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.