Patent · US Active

Method for computing similarity between text spans using factored word sequence kernels

US8077984B2 · kind B2 · utility

13Cited by

4References

24Claims

0Family size

Assignee

Xerox Corporation · US

Inventors

Nicola Cancedda · Grenoble, FR
Pierre Mahe · Les Jailleux, FR

Key dates

Filing date	Jan 4, 2008
Grant date	Dec 13, 2011
Priority date	—
Expiry date	Oct 14, 2030

Classification

Technology area (CPC G)Physics
CPC primaryG06V30/274
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A computer implemented method and an apparatus for comparing spans of text are disclosed. The method includes computing a similarity measure between a first sequence of symbols representing a first text span and a second sequence of symbols representing a second text span as a function of the occurrences of optionally noncontiguous subsequences of symbols shared by the two sequences of symbols. Each of the symbols comprises at least one consecutive word and is defined according to a set of linguistic factors. Pairs of symbols in the first and second sequences that form a shared subsequence of symbols are each matched according to at least one of the factors.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.