Patent · US Active

System and method for identifying website verticals

US9330168B1 · kind B1 · utility

6Cited by
4References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateFeb 13, 2014
Grant dateMay 3, 2016
Priority date
Expiry dateFeb 13, 2034

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06Q30/0283
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Systems and methods for the categorization of websites are presented. A website is categorized using one or a combination of its domain name and its web page content. The domain name is tokenized, and the tokens compared to categories in a category structure to determine probabilities that the token belongs to each category. Combinations of tokens are similarly compared to the categories. A category may be determined with reference to a vector space in which a training set of websites having known categories is converted according to a methodology into reference vectors containing keyword frequencies. A target website is converted to a target vector using the same methodology, and a distance score of the target vector to each reference vector is calculated. The website represented by the target vector is assigned the category of the reference vector having the lowest distance score.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.