Patent · US Active

System and techniques for handling long text for pre-trained language models

US12210830B2 · kind B2 · utility

0Cited by

6References

20Claims

0Family size

Assignee

Oracle International Corporation · US

Inventors

Thanh Tien Vu · Tecate Mission Road, US
Tuyen Quang Pham · Melbourne, AU
Mark Edward Johnson · Chatswood, AU
Thanh Long Duong · Melbourne, AU
Ying Xu · Albion, AU
Poorya Zaremoodi · Melbourne, AU
Omid Mohamad Nezami · Sydney, AU
Budhaditya Saha · Sydney, AU
Cong Duy Vu Hoang · Melbourne, AU

Key dates

Filing date	May 20, 2022
Grant date	Jan 28, 2025
Priority date	—
Expiry date	Feb 10, 2043

Classification

Technology area (CPC G)Physics
CPC primaryG06F40/284
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

In some aspects, a computing device may receive, at a data processing system, a set of utterances for training or inferencing with a named entity recognizer to assign a label to each token piece from the set of utterances. The computing device may determine a length of each utterance in the set and when the length of the utterance exceeds a pre-determined threshold of token pieces: dividing the utterance into a plurality of overlapping chunks of token pieces; assigning a label together with a confidence score for each token piece in a chunk; determining a final label and an associated confidence score for each chunk of token pieces by merging two confidence scores; determining a final annotated label for the utterance based at least on the merging the two confidence scores; and storing the final annotated label in a memory.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.