Patent · US Active

Tokenization of text data to facilitate automated discovery of speech disfluencies

US11741303B2 · kind B2 · utility

1Cited by

3References

15Claims

0Family size

Assignee

Descript, Inc. · US

Inventors

Alexandre de Brébisson · Montréal, CA
Antoine d'Andigné · Paris, FR

Key dates

Filing date	Nov 10, 2020
Grant date	Aug 29, 2023
Priority date	—
Expiry date	Nov 10, 2040

Classification

Technology area (CPC G)Physics
CPC primaryG10L15/26
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Introduced here are computer programs and associated computer-implemented techniques for discovering the presence of filler words through tokenization of a transcript derived from audio content. When audio content is obtained by a media production platform, the audio content can be converted into text content as part of a speech-to-text operation. The text content can then be tokenized and labeled using a Natural Language Processing (NLP) library. Tokenizing/labeling may be performed in accordance with a series of rules associated with filler words. At a high level, these rules may examine the text content (and associated tokens/labels) to determine whether patterns, relationships, verbatim, and context indicate that a term is a filler word. Any filler words that are discovered in the text content can be identified as such so that appropriate action(s) can be taken.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.