Patent · US Active

Sampling training data for an automatic speech recognition system based on a benchmark classification distribution

US9202461B2 · kind B2 · utility

7Cited by

66References

17Claims

0Family size

Assignee

Google LLC · US

Inventors

Fadi Biadsy · Mountain View, US
Pedro J. Moreno Mengibar · Jersey City, US
Kaisuke Nakajima · Sunnyvale, US
Daniel M. Bikel · Mount Kisco, US

Key dates

Filing date	Jan 18, 2013
Grant date	Dec 1, 2015
Priority date	—
Expiry date	Nov 22, 2033

Classification

Technology area (CPC G)Physics
CPC primaryG10L15/183
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A set of benchmark text strings may be classified to provide a set of benchmark classifications. The benchmark text strings in the set may correspond to a benchmark corpus of benchmark utterances in a particular language. A benchmark classification distribution of the set of benchmark classifications may be determined. A respective classification for each text string in a corpus of text strings may also be determined. Text strings from the corpus of text strings may be sampled to form a training corpus of training text strings such that the classifications of the training text strings have a training text string classification distribution that is based on the benchmark classification distribution. The training corpus of training text strings may be used to train an automatic speech recognition (ASR) system.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.