System and method for identifying website verticals
US9330168B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Feb 13, 2014 |
| Grant date | May 3, 2016 |
| Priority date | — |
| Expiry date | Feb 13, 2034 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06Q30/0283
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Systems and methods for the categorization of websites are presented. A website is categorized using one or a combination of its domain name and its web page content. The domain name is tokenized, and the tokens compared to categories in a category structure to determine probabilities that the token belongs to each category. Combinations of tokens are similarly compared to the categories. A category may be determined with reference to a vector space in which a training set of websites having known categories is converted according to a methodology into reference vectors containing keyword frequencies. A target website is converted to a target vector using the same methodology, and a distance score of the target vector to each reference vector is calculated. The website represented by the target vector is assigned the category of the reference vector having the lowest distance score.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.