Automatic extraction of human-readable lists from structured documents
US7558792B2 · kind B2 · utility
Assignee
Inventor
Key dates
| Filing date | Jun 29, 2004 |
| Grant date | Jul 7, 2009 |
| Priority date | — |
| Expiry date | Apr 6, 2025 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/103
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
One aspect of the invention extracts a human readable list from a document. It does this by accessing a file that contains data that represents a portion of the document. The data is formatted in accordance with a document formatting description. The data is parsed into tokens that include container tokens and textual tokens. From the container tokens, this aspect determines a context for some of the textual tokens. Once the context is determined, this aspect determines a separator pattern between one of the textual tokens and an adjacent textual token where both the textual token and the adjacent textual token have the same context. Once the separator pattern is determined, the textual tokens can be extracted responsive to the separator pattern. Finally, the textual tokens are presented as the human readable list (for example, displayed, returned in a database, returned in response to a function or subroutine call, etc.).
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.