Receptive-field-conforming convolution models for video coding
US11025907B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Feb 28, 2019 |
| Grant date | Jun 1, 2021 |
| Priority date | — |
| Expiry date | Feb 28, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N3/082
- WIPO fieldAudio-visual technology
- WIPO sectorElectrical engineering
Abstract
Convolutional neural networks (CNN) that determine a mode decision (e.g., block partitioning) for encoding a block include feature extraction layers and multiple classifiers. A non-overlapping convolution operation is performed at a feature extraction layer by setting a stride value equal to a kernel size. The block has a N×N size, and a smallest partition output for the block has a S×S size. Classification layers of each classifier receive feature maps having a feature dimension. An initial classification layer receives the feature maps as an output of a final feature extraction layer. Each classifier infers partition decisions for sub-blocks of size (αS)×(αS) of the block, wherein α is a power of 2 and α=2, . . . , N/S, by applying, at some successive classification layers, a 1×1 kernel to reduce respective feature dimensions; and outputting by a last layer of the classification layers an output corresponding to a N/(αS)×N/(αS)×1 output map.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.