Patent · US Active

Detecting phishing websites via a machine learning-based system using URL feature hashes, HTML encodings and embedded images of content pages

US11336689B1 · kind B1 · utility

17Cited by
27References
12Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 14, 2021
Grant dateMay 17, 2022
Priority date
Expiry dateSep 14, 2041

Classification

  • Technology area (CPC H)Electricity
  • CPC primaryH04L67/02
  • WIPO fieldDigital communication
  • WIPO sectorElectrical engineering

Abstract

Disclosed is phishing classifier that classifies a URL and content page accessed via the URL as phishing or not is disclosed, with URL feature hasher that parses and hashes the URL to produce feature hashes, and headless browser to access and internally render a content page at the URL, extract HTML tokens, and capture an image of the rendering. Also disclosed are an HTML encoder, trained on HTML tokens extracted from pages at URLs, encoded, then decoded to reproduce images captured from rendering, that produces an HTML encoding of the tokens extracted, and an image embedder, pretrained on images, that produces an image embedding of the image captured. Further, phishing classifier layers, trained on the feature hashes, the HTML encoding, and the image embedding, process the URL feature hashes, HTML encoding and image embeddings to produce a likelihood score that the URL and the page accessed presents a phishing risk.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.