Video action detection method based on convolutional neural network
US11379711B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Aug 16, 2017 |
| Grant date | Jul 5, 2022 |
| Priority date | — |
| Expiry date | Feb 15, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V20/41
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A video action detection method based on a convolutional neural network (CNN) is disclosed in the field of computer vision recognition technologies. A temporal-spatial pyramid pooling layer is added to a network structure, which eliminates limitations on input by a network, speeds up training and detection, and improves performance of video action classification and time location. The disclosed convolutional neural network includes a convolutional layer, a common pooling layer, a temporal-spatial pyramid pooling layer and a full connection layer. The outputs of the convolutional neural network include a category classification output layer and a time localization calculation result output layer. The disclosed method does not require down-sampling to obtain video clips of different durations, but instead utilizes direct input of the whole video at once, improving efficiency. Moreover, the network is trained by using video clips of the same frequency without increasing differences within a category, thus reducing the learning burden of the network, achieving faster model convergence and better detection.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.