Determining that a resource is spam based upon a uniform resource locator of the webpage
US11829423B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 25, 2021 |
| Grant date | Nov 28, 2023 |
| Priority date | — |
| Expiry date | Jun 25, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/284
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Described herein are technologies relating to predicting whether a resource is spam based solely upon a Uniform Resource Locator (URL) for the resource. The URL is tokenized in connection with generating a sequence of numerical identifiers for the resource. A score for the URL is computed based upon the sequence of numerical identifiers, where the score is indicative of a probability that the resource pointed to by the URL is spam. generating a score for the URL based upon the sequence of numbers, wherein the score is indicative of a probability that the resource pointed to by the URL is spam. When the score is above a predefined threshold, a label is assigned to the URL that indicates that the resource pointed to by the URL is spam, and an entry for the resource is not included in a search engine index based upon the label assigned to the URL.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.