Patent · US Active

Determining that a resource is spam based upon a uniform resource locator of the webpage

US11829423B2 · kind B2 · utility

0Cited by
6References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 25, 2021
Grant dateNov 28, 2023
Priority date
Expiry dateJun 25, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/284
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Described herein are technologies relating to predicting whether a resource is spam based solely upon a Uniform Resource Locator (URL) for the resource. The URL is tokenized in connection with generating a sequence of numerical identifiers for the resource. A score for the URL is computed based upon the sequence of numerical identifiers, where the score is indicative of a probability that the resource pointed to by the URL is spam. generating a score for the URL based upon the sequence of numbers, wherein the score is indicative of a probability that the resource pointed to by the URL is spam. When the score is above a predefined threshold, a label is assigned to the URL that indicates that the resource pointed to by the URL is spam, and an entry for the resource is not included in a search engine index based upon the label assigned to the URL.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.