Patent · US Active

Generating a consistently labeled training dataset by automatically generating and displaying a set of most similar previously-labeled texts and their previously assigned labels for each text that is being labeled for the training dataset

US10789533B2 · kind B2 · utility

2Cited by

11References

20Claims

0Family size

Assignee

LogMeln, Inc. · US

Inventors

Whitney Lige Clark · Auburn, US
Ashish V. Thapliyal · Lompoc, US
Christfried H. Focke · Lompoc, US
Alexander John Huitric · Goleta, US
Yogesh Moorjani · Goleta, US

Key dates

Filing date	Jul 26, 2017
Grant date	Sep 29, 2020
Priority date	—
Expiry date	Jun 21, 2039

Classification

Technology area (CPC G)Physics
CPC primaryG06N20/00
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Technology for generating a consistently labeled training dataset. For each one of multiple previously labeled texts, a distance between the previously labeled text and a current text to be labeled is generated by comparing a list of tokens for the previously labeled text to a list of tokens for the current text to determine an overlap value equal to a number of tokens that match between the list of tokens for the previously labeled text and the list of tokens for the current text, and using the overlap value to calculate a distance between the previously labeled text and the current text that is inversely correlated to the overlap value. Previously labeled texts that are most similar to the current text are identified as those previously labeled texts having the shortest distances to the current text, and are displayed with their previously assigned labels in a label selection user interface.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.