Methods and systems for automated parsing and identification of textual data
US11218500B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 31, 2019 |
| Grant date | Jan 4, 2022 |
| Priority date | — |
| Expiry date | Jul 21, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N7/01
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method and system for parsing and identifying security log message data, which can include receiving system generated unstructured or partially semi-structured security log data from a plurality of source systems and devices, including a variety of different source systems and/or devices. The message data is received from the various sources in the form of raw log message data, as a stream of bytes received by a parsing system that identifies and extracts character features of the incoming raw messages. The extracted character features are compiled into data structures that are evaluated by a model(s) to determine segmentation boundaries thereof and generate message tokens, which are further classified as including variable data field(s) or as a template text string. Template categorized message tokens are used to provide message fingerprint information for characterizing the overall form of the message, and for comparison to a collection of previously stored/evaluated message fingerprints by a classifier. If the message fingerprint is determined to match a stored fingerprint with or above a selected confidence level, the parsed message can be stored. Unidentified message forms/f…
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.