Patent · US Active

Method and apparatus for assessing similarity between online job listings

US8099415B2 · kind B2 · utility

36Cited by
13References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 8, 2006
Grant dateJan 17, 2012
Priority date
Expiry dateOct 8, 2028

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/258
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Job listings retrieved from external sources are pre-processed prior to being stored in the search engine production database and duplicate records identified prior to storage in a production database for the search engine. Inter-source and intra-source hash values are calculated for each job listing and the values compared. Job listings having the same intra-source hash are judged to be duplicates of each other. Descriptions whose intra-source hash values do not match, but whose inter-source hash values match are judged to be duplicate candidates and subject to further processing. Suffixes for each such record are stored to a data structure such as a suffix array and the records searched and compared based on the suffix arrays. Records having a pre-determined number of contiguous words in common are judged to be duplicates. Duplicate records are identified before the data set is stored to the production data base.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.