Identifying and formatting headers for text content
US12001775B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 13, 2023 |
| Grant date | Jun 4, 2024 |
| Priority date | — |
| Expiry date | Jun 13, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/117
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A data corpus is partitioned into text strings for header classification. A group characteristic is computed for a text string, and whether the group characteristic satisfies a group characteristic criterion is determined. The text string may be disqualified from header classification if the group characteristic criterion is not satisfied, or one or more font characteristics may be determined for the text string if the group characteristic criterion is satisfied. A font characteristic that meets one or more prevalence criteria may be identified and evaluated to determine whether the font characteristic meets at least one font characteristic criterion. The text string may be disqualified from header classification if the font characteristic criterion is not satisfied, or if the font characteristic meets the font characteristic criterion, the text string is classified as a header, and tagged content is generated by applying a header tag to the text string.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.