Patent · US Active

Generating multi-modal response(s) through utilization of large language model(s)

US11907674B1 · kind B1 · utility

1Cited by

3References

24Claims

0Family size

Assignee

Google LLC · US

Inventors

Oscar Akerlund · Zürich, CH
Evgeny Sluzhaev · Zürich, CH
Golnaz Ghiasi · Mountain View, US
Thang Minh Luong · Mountain View, US
Yifeng Lu · Mountain View, US
Igor Petrovski · Zürich, CH
Ágoston Weisz · Zürich, CH
Wei Yu · Mountain View, US
Rakesh Shivanna · Sunnyvale, US
Michael Andrew Goodman · Oakland, US
Apoorv Kulshreshtha · Mountain View, US
Yu Du · 红钢城街道, CN
Amin Ghafouri · San Francisco, US
Sanil Jain · Sunnyvale, US
Dustin Tran · San Francisco, US
Vikas Peswani · Mountain View, US
YaGuang Li · Milton, CA

Key dates

Filing date	Sep 20, 2023
Grant date	Feb 20, 2024
Priority date	—
Expiry date	Sep 20, 2043

Classification

Technology area (CPC G)Physics
CPC primaryG06F16/433
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Implementations relate to generating multi-modal response(s) through utilization of large language model(s) (LLM(s)). Processor(s) of a system can: receive natural language (NL) based input, generate a multi-modal response that is responsive to the NL based output, and cause the multi-modal response to be rendered. In some implementations, and in generating the multi-modal response, the processor(s) can process, using a LLM, LLM input (e.g., that includes at least the NL based input) to generate LLM output, and determine, based on the LLM output, textual content for inclusion in the multi-modal response and multimedia content for inclusion in the multi-modal response. In some implementations, the multimedia content can be obtained based on a multimedia content tag that is included in the LLM output and that is indicative of the multimedia content. In various implementations, the multimedia content can be interleaved between segments of the textual content.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.