Patent · US Active

Collecting multimodal image editing requests

US10769495B2 · kind B2 · utility

0Cited by

3References

20Claims

0Family size

Assignee

Adobe Inc. · US

Inventors

Trung Bui · San Jose, US
Zhe Lin · Fremont, US
Walter Chang · San Jose, US
Nham Van Le · San Jose, US
Franck Dernoncourt · Sunnyvale, US

Key dates

Filing date	Aug 1, 2018
Grant date	Sep 8, 2020
Priority date	—
Expiry date	Sep 26, 2038

Classification

Technology area (CPC G)Physics
CPC primaryG10L2015/0638
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

In implementations of collecting multimodal image editing requests (IERs), a user interface is generated that exposes an image pair including a first image and a second image including at least one edit to the first image. A user simultaneously speaks a voice command and performs a user gesture that describe an edit of the first image used to generate the second image. The user gesture and the voice command are simultaneously recorded and synchronized with timestamps. The voice command is played back, and the user transcribes their voice command based on the play back, creating an exact transcription of their voice command. Audio samples of the voice command with respective timestamps, coordinates of the user gesture with respective timestamps, and a transcription are packaged as a structured data object for use as training data to train a neural network to recognize multimodal IERs in an image editing application.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.