Patent · US Active

Identifying and formatting headers for text content

US12001775B1 · kind B1 · utility

0Cited by
6References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 13, 2023
Grant dateJun 4, 2024
Priority date
Expiry dateJun 13, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/117
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A data corpus is partitioned into text strings for header classification. A group characteristic is computed for a text string, and whether the group characteristic satisfies a group characteristic criterion is determined. The text string may be disqualified from header classification if the group characteristic criterion is not satisfied, or one or more font characteristics may be determined for the text string if the group characteristic criterion is satisfied. A font characteristic that meets one or more prevalence criteria may be identified and evaluated to determine whether the font characteristic meets at least one font characteristic criterion. The text string may be disqualified from header classification if the font characteristic criterion is not satisfied, or if the font characteristic meets the font characteristic criterion, the text string is classified as a header, and tagged content is generated by applying a header tag to the text string.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.