NLP-guided video thin-slicing for automated scoring of non-cognitive, behavioral performance tasks
US12300244B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Aug 22, 2022 |
| Grant date | May 13, 2025 |
| Priority date | — |
| Expiry date | Jul 8, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L15/16
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Data is received that encapsulates a video of a subject performing a task. This video is used to generate a transcript using an automatic speech recognition (ASR) system. A plurality of text segments are generated from the transcript and then tokenized. A textual representation of each segment is extracted by a transformer model using the tokenized text segment (i.e., the tokens corresponding to the text segment). Thereafter, for each segment, a fused representation derived from the textual representations and corresponding visual and audio features from the video is generated. A sparse attention machine learning model then selects an optimal slice of the video based on the fused representations. The optimal slice can then be input into one or more machine learning models trained to characterize performance of the task by the subject.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.