Patent · US Active

Methods and systems for automated parsing and identification of textual data

US11218500B2 · kind B2 · utility

1Cited by
8References
10Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 31, 2019
Grant dateJan 4, 2022
Priority date
Expiry dateJul 21, 2040

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N7/01
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method and system for parsing and identifying security log message data, which can include receiving system generated unstructured or partially semi-structured security log data from a plurality of source systems and devices, including a variety of different source systems and/or devices. The message data is received from the various sources in the form of raw log message data, as a stream of bytes received by a parsing system that identifies and extracts character features of the incoming raw messages. The extracted character features are compiled into data structures that are evaluated by a model(s) to determine segmentation boundaries thereof and generate message tokens, which are further classified as including variable data field(s) or as a template text string. Template categorized message tokens are used to provide message fingerprint information for characterizing the overall form of the message, and for comparison to a collection of previously stored/evaluated message fingerprints by a classifier. If the message fingerprint is determined to match a stored fingerprint with or above a selected confidence level, the parsed message can be stored. Unidentified message forms/f…

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.