Image URL-based junk detection
US9336316B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Nov 5, 2012 |
| Grant date | May 10, 2016 |
| Priority date | — |
| Expiry date | Apr 29, 2033 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N5/025
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Architecture that includes a junk (unwanted) image detection algorithm which performs junk image detection of unwanted images before the images are actually downloaded for indexing. Features are employed related to image location information and host websites, such as image path descriptor (e.g., URL-uniform resource locator) pattern features, webpage content features, click features, and image aggregated information in a machine learning based framework to predict the probability that an image is unwanted (or wanted) before the images are downloaded. The framework is then applied to build a statistical model and predict junk scores. By removing image URLs marked as “junk” from the work list of an automated indexer (e.g., crawler), the indexer bandwidth is significantly improved with a corresponding improvement in the publish rate.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.