Patent · US Active

System and method for website categorization

US9892189B1 · kind B1 · utility

2Cited by
1References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMar 10, 2016
Grant dateFeb 13, 2018
Priority date
Expiry dateMay 13, 2036

Classification

  • Technology area (CPC H)Electricity
  • CPC primaryH04L61/5076
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Systems and methods for the categorization of websites are presented. A website is categorized using one or a combination of its domain name and its web page content. The domain name is tokenized, and the tokens compared to categories in a category structure to determine probabilities that the token belongs to each category. Combinations of tokens are similarly compared to the categories. A category may be determined with reference to a vector space in which a training set of websites having known categories is converted according to a methodology into reference vectors containing keyword frequencies. A target website is converted to a target vector using the same methodology, and a distance score of the target vector to each reference vector is calculated. The website represented by the target vector is assigned the category of the reference vector having the lowest distance score.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.