Patent · US Active

Generating a consistently labeled training dataset by automatically generating and displaying a set of most similar previously-labeled texts and their previously assigned labels for each text that is being labeled for the training dataset

US10789533B2 · kind B2 · utility

2Cited by
11References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 26, 2017
Grant dateSep 29, 2020
Priority date
Expiry dateJun 21, 2039

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N20/00
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Technology for generating a consistently labeled training dataset. For each one of multiple previously labeled texts, a distance between the previously labeled text and a current text to be labeled is generated by comparing a list of tokens for the previously labeled text to a list of tokens for the current text to determine an overlap value equal to a number of tokens that match between the list of tokens for the previously labeled text and the list of tokens for the current text, and using the overlap value to calculate a distance between the previously labeled text and the current text that is inversely correlated to the overlap value. Previously labeled texts that are most similar to the current text are identified as those previously labeled texts having the shortest distances to the current text, and are displayed with their previously assigned labels in a label selection user interface.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.