Patent · US Active

Method and apparatus of NER-oriented chinese clinical text data augmentation

US11972214B2 · kind B2 · utility

0Cited by
4References
6Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 6, 2023
Grant dateApr 30, 2024
Priority date
Expiry dateJul 6, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/117
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Disclosed is a method and an apparatus NER-orientated Chinese clinical text data augmentation, and unannotated data and annotated data of label linearization processing through data preprocessing. A concealed part is predicted based on retained information by using the unannotated data and concealing part of information in text, and meanwhile an entity word-level discrimination task is introduced for pre-training of a span-based language model; and a plurality of decoding mechanisms are introduced in a fine-tune stage, a relationship between a text vector and text data is obtained based on the pre-trained span-based language model, linearized data with entity labels is converted into the text vector, and text generation is performed through forward decoding and reverse decoding in a prediction stage of a text generation model to obtain enhanced data with annotation information.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.