System and method for website categorization
US9892189B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 10, 2016 |
| Grant date | Feb 13, 2018 |
| Priority date | — |
| Expiry date | May 13, 2036 |
Classification
- Technology area (CPC H)Electricity
- CPC primaryH04L61/5076
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Systems and methods for the categorization of websites are presented. A website is categorized using one or a combination of its domain name and its web page content. The domain name is tokenized, and the tokens compared to categories in a category structure to determine probabilities that the token belongs to each category. Combinations of tokens are similarly compared to the categories. A category may be determined with reference to a vector space in which a training set of websites having known categories is converted according to a methodology into reference vectors containing keyword frequencies. A target website is converted to a target vector using the same methodology, and a distance score of the target vector to each reference vector is calculated. The website represented by the target vector is assigned the category of the reference vector having the lowest distance score.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.