Intelligent signature-based anti-cloaking web recrawling
US11444977B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Oct 22, 2019 |
| Grant date | Sep 13, 2022 |
| Priority date | — |
| Expiry date | Jul 23, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F2221/033
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Web sites are crawled using multiple browser profiles to avoid malicious cloaking. Based on web page content returned from HTTP requests using the multiple browser profiles, web sites returning substantively different content to HTTP requests for different browser profiles are identified. Web sites are further filtered by common cloaking behavior, and redirect scripts are extracted from web page content that performed cloaking. Signatures comprising tokenized versions of the redirect scripts are generated and compared to a database of known cloaking signatures. URLs corresponding to signatures having approximate matches with signatures in the database are flagged for recrawling. Recrawled URLs are verified for malicious cloaking again using HTTP requests from multiple browser profiles.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.