Patent · US Active

Corpus management by automatic categorization into functional domains to support faceted querying

US10346442B2 · kind B2 · utility

2Cited by
6References
11Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 17, 2016
Grant dateJul 9, 2019
Priority date
Expiry dateApr 11, 2037

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/248
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Embodiments can provide a computer implemented method, in a data processing system comprising a processor and a memory comprising instructions which are executed by the processor to cause the processor to implement an enhanced corpus management system, the method comprising: identifying one or more functional domain categories; ingesting one or more incoming documents to form an open-domain corpus; for each functional domain category, identifying one or more representative documents to establish a seed sub-corpus; calculating a degree of fit score between each of the one or more incoming documents and the one or more established functional domain category seed sub-corpora; and assigning one or more of the incoming documents to one or more of the functional domain categories based upon the degree of fit score to create an enhanced corpus.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.