Patent · US Expired

Automatic extraction of human-readable lists from structured documents

US7558792B2 · kind B2 · utility

45Cited by
12References
17Claims
0Family size

Assignee

Inventor

Key dates

Filing dateJun 29, 2004
Grant dateJul 7, 2009
Priority date
Expiry dateApr 6, 2025

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/103
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

One aspect of the invention extracts a human readable list from a document. It does this by accessing a file that contains data that represents a portion of the document. The data is formatted in accordance with a document formatting description. The data is parsed into tokens that include container tokens and textual tokens. From the container tokens, this aspect determines a context for some of the textual tokens. Once the context is determined, this aspect determines a separator pattern between one of the textual tokens and an adjacent textual token where both the textual token and the adjacent textual token have the same context. Once the separator pattern is determined, the textual tokens can be extracted responsive to the separator pattern. Finally, the textual tokens are presented as the human readable list (for example, displayed, returned in a database, returned in response to a function or subroutine call, etc.).

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.