Patent · US Active

Optimized web domains classification based on progressive crawling with clustering

US8972376B1 · kind B1 · utility

22Cited by
2References
25Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJan 2, 2013
Grant dateMar 3, 2015
Priority date
Expiry dateApr 4, 2033

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/951
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Techniques for optimized web domains classification based on progressive crawling with clustering are disclosed. In some embodiments, optimized web domains classification based on progressive crawling with clustering includes crawling a domain (e.g., a web site domain) to collect data for a subset of pages (e.g., web pages) of a corpus of content associated with the domain; classifying each of the crawled pages into one or more category clusters, in which the category clusters represent a content categorization of the corpus of content associated with the domain (e.g., a URL content categorization for the domain, host of that domain, and/or directory of that domain); and determining which of the one or more category clusters to publish for the domain.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.