Method and apparatus of NER-oriented chinese clinical text data augmentation
US11972214B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 6, 2023 |
| Grant date | Apr 30, 2024 |
| Priority date | — |
| Expiry date | Jul 6, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/117
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Disclosed is a method and an apparatus NER-orientated Chinese clinical text data augmentation, and unannotated data and annotated data of label linearization processing through data preprocessing. A concealed part is predicted based on retained information by using the unannotated data and concealing part of information in text, and meanwhile an entity word-level discrimination task is introduced for pre-training of a span-based language model; and a plurality of decoding mechanisms are introduced in a fine-tune stage, a relationship between a text vector and text data is obtained based on the pre-trained span-based language model, linearized data with entity labels is converted into the text vector, and text generation is performed through forward decoding and reverse decoding in a prediction stage of a text generation model to obtain enhanced data with annotation information.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.