Patent · US Active

Image URL-based junk detection

US9336316B2 · kind B2 · utility

0Cited by
1References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 5, 2012
Grant dateMay 10, 2016
Priority date
Expiry dateApr 29, 2033

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N5/025
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Architecture that includes a junk (unwanted) image detection algorithm which performs junk image detection of unwanted images before the images are actually downloaded for indexing. Features are employed related to image location information and host websites, such as image path descriptor (e.g., URL-uniform resource locator) pattern features, webpage content features, click features, and image aggregated information in a machine learning based framework to predict the probability that an image is unwanted (or wanted) before the images are downloaded. The framework is then applied to build a statistical model and predict junk scores. By removing image URLs marked as “junk” from the work list of an automated indexer (e.g., crawler), the indexer bandwidth is significantly improved with a corresponding improvement in the publish rate.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.