Patent · US Active

Document characterization using a tensor space model

US7529719B2 · kind B2 · utility

7Cited by
1References
15Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMar 17, 2006
Grant dateMay 5, 2009
Priority date
Expiry dateJan 12, 2027

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/35
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Computer-readable media having computer-executable instructions and apparatuses categorize documents or corpus of documents. A Tensor Space Model (TSM), which models the text by a higher-order tensor, represents a document or a corpus of documents. Supported by techniques of multilinear algebra, TSM provides a framework for analyzing the multifactor structures. TSM is further supported by operations and presented tools, such as the High-Order Singular Value Decomposition (HOSVD) for a reduction of the dimensions of the higher-order tensor. The dimensionally reduced tensor is compared with tensors that represent possible categories. Consequently, a category is selected for the document or corpus of documents. Experimental results on the dataset for 20 Newsgroups suggest that TSM is advantageous to a Vector Space Model (VSM) for text classification.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.