Patent · US Active

Document clustering based upon document structure

US12061675B1 · kind B1 · utility

0Cited by
1References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateOct 7, 2021
Grant dateAug 13, 2024
Priority date
Expiry dateJul 16, 2042

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/413
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

One embodiment provides a method for clustering documents based upon a structure of each of the documents, including: receiving, at a device utilizing the machine-learning model, at least one document, each including a plurality of characters and having a structure; converting, for each of the at least one document, each of the plurality of characters to one of a plurality of character representations, wherein the converting includes identifying an attribute of a character and selecting a character representation corresponding to the attribute; producing at least one array for each of the one or more documents, wherein the at least one array includes the plurality of characters converted to the character representations; and clustering the at least one document into document clusters having similar structures by grouping the at least one arrays into groups of arrays having similarities, wherein each document cluster include documents corresponding to the arrays within one of the groups of arrays.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.