Patent · US Active

Automatic generation of training data for scientific paper summarization using videos

US11270061B2 · kind B2 · utility

4Cited by
8References
11Claims
0Family size

Assignee

Inventors

Key dates

Filing dateFeb 25, 2020
Grant dateMar 8, 2022
Priority date
Expiry dateJul 19, 2040

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N20/10
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Embodiments may provide techniques to generate training data for summarization of complex documents, such as scientific papers, articles, etc., that are scalable to provide large scale training data. For example, in an embodiment, a method may be implemented in a computer system and may comprise collecting a plurality of video and audio recordings of presentations of documents, collecting a plurality of documents corresponding to the video and audio recordings, converting the plurality of video and audio recordings of presentations of documents into transcripts of the plurality of presentations, generating a summary of each document by selecting a plurality of sentences from each document using the transcript of the that document, generating a dataset comprising a plurality of the generated summaries, and training a machine learning model using the generated dataset.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.