Patent · US Active

Techniques for computing similarity measurements between segments representative of documents

US8166049B2 · kind B2 · utility

3Cited by
6References
6Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMay 28, 2009
Grant dateApr 24, 2012
Priority date
Expiry dateApr 6, 2030

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/316
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Keyword frequency data for a plurality of document-derived segments is represented in a matrix form in which each segment is represented as a vector of dimensionality equal to the number of keywords. The matrix may be subdivided into a plurality of sub-matrices, each preferably corresponding to a non-overlapping portion of the plurality of keywords. When determining a similarity measurement between any pair of segments, at least a portion of the keyword frequency data for each sub-matrix's non-overlapping keywords are used to determine a sub-matrix dot product for the pair of segments. The resulting plurality of sub-matrix dot products are then summed together in order to provide the similarity measurement. Keywords that are synonyms of each other may be accommodated through the modification of keyword frequency data. Where the keyword frequency data in the matrix representation is relative sparse, compressed views of the matrix representation may be provided.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.