Patent · US Active

Automatic extraction of a training corpus for a data classifier based on machine learning algorithms

US11409779B2 · kind B2 · utility

0Cited by

1References

15Claims

0Family size

Assignee

ACCENTURE GLOBAL SOLUTIONS LIMITED · IE

Inventors

Fang Hou · Beijing, CN
Yikai Wu · Beijing, CN
Xiaopei Cheng · Beijing, CN
Sifei Ding · Beijing, CN

Key dates

Filing date	May 11, 2018
Grant date	Aug 9, 2022
Priority date	—
Expiry date	Jan 27, 2039

Classification

Technology area (CPC G)Physics
CPC primaryG06N5/046
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

An iterative classifier for unsegmented electronic documents is based on machine learning algorithms. The textual strings in the electronic document are segmented using a composite dictionary that combines a conventional dictionary and an adaptive dictionary developed based on the context and nature of the electronic document. The classifier is built using a corpus of training and testing samples automatically extracted from the electronic document by detecting signatures for a set of pre-established classes for the textual strings. The classifier is further iteratively improved by automatically expanding the corpus of training and testing samples in real-time when textual strings in new electronic documents are processed and classified.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.