Very deep convolutional neural networks for end-to-end speech recognition
US10510004B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Apr 10, 2019 |
| Grant date | Dec 17, 2019 |
| Priority date | — |
| Expiry date | Apr 10, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L15/22
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A speech recognition neural network system includes an encoder neural network and a decoder neural network. The encoder neural network generates an encoded sequence from an input acoustic sequence that represents an utterance. The input acoustic sequence includes a respective acoustic feature representation at each of a plurality of input time steps, the encoded sequence includes a respective encoded representation at each of a plurality of time reduced time steps, and the number of time reduced time steps is less than the number of input time steps. The encoder neural network includes a time reduction subnetwork, a convolutional LSTM subnetwork, and a network in network subnetwork. The decoder neural network receives the encoded sequence and processes the encoded sequence to generate, for each position in an output sequence order, a set of substring scores that includes a respective substring score for each substring in a set of substrings.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.