Patent · US Active

Speech audio pre-processing segmentation

US11049502B1 · kind B1 · utility

3Cited by

2References

30Claims

0Family size

Assignee

SAS INSTITUTE, INC. · US

Inventors

Xiaozhuo Cheng · Beijing, CN
Xu Yang · Cary, US
Xiaolong Li · Beijing, CN

Key dates

Filing date	Dec 30, 2020
Grant date	Jun 29, 2021
Priority date	—
Expiry date	Dec 30, 2040

Classification

Technology area (CPC G)Physics
CPC primaryG10L2025/783
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

An apparatus includes processor(s) to: divide a speech data set into multiple data chunks that each represent a chunk of speech audio; configure a neural network to implement an acoustic model that includes a CTC output; provide each data chunk to the neural network and monitor the CTC output for a string of blank symbols; designate each string of blank symbols from the CTC output that is at least as long as a predetermined blank threshold length as a likely sentence pause of a candidate set of likely sentence pauses; based on at least the candidate set, divide the speech data set into multiple data segments that each represent a speech segment of the speech audio; and perform speech-to-text conversion, to identify a sentence spoken in a selected language in each speech segment.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.